lemm.ee

report

reply

[ - ]

neidu2@feddit.nl

14 points

3 months ago

*

I’m intrigued. The search results are more akin to how they used to be 25 years ago on the internet that I loved
Https://Search.marginalia.nu is definitely something I’ll be exploring going forward!

report

reply

[ - ]

Rai@lemmy.dbzer0.com

4 points

3 months ago

I searched for Dance Dance Revolution and ended up here.

The absolute nostalgia of it all. Love it.

report

reply

[ - ]

ggtdbz@lemmy.dbzer0.com

8 points

3 months ago

Replying under the top comment but this really applies to all of these, how do these search engines determine what counts as a personal site? For example I had procrastinated for years on finally spinning up a static, barren HTML blog. The infamous Lucidity AI post introduced me to Mataroa and I got over the hump and started writing. Would that get indexed? Etc

Does it just crawl through webrings?

report

reply

[ - ]

oploskoffie@feddit.nl

4 points

3 months ago

I believe you have to submit your own website to this one for manual addition to its index

report

reply

[ - ]

dch82@lemmy.zipOP

6 points

3 months ago

That is exactly what I needed; the subdomains are now in my bookmarks.

report

zutto@lemmy.fedi.zutto.fi

reply

[ - ]

35 points

3 months ago

Teclis - Includes search results from Marginalia, free to use at the moment. This search index has been in the past closed down due to abuse.

Kagi, whose creation Teclis is, is a paid search engine (metasearch engine to be more precise) also incorporates these search results in their normal searches. I warmly recommend giving Kagi a try, it’s great, I’ve been enjoying it a lot.

–

Other options I can recommend; You could always try to host your own search engine if you have list of small-web sites in mind or don’t mind spending some effort collecting such list. I personally host Yacy [github link] (and Searxng to interface with yacy and several other self-hosted indexes/search engines such as kiwix wiki’s.). Indexing and crawling your own search results surprisingly is not resource heavy at all, and can be run on your personal machine in the background.

https://help.kagi.com/kagi/search-details/search-sources.html

report

reply

[ - ]

troed@fedia.io

12 points

3 months ago

Not just a meta search engine though - they do have their own index as well.

report

zutto@lemmy.fedi.zutto.fi

reply

[ - ]

5 points

3 months ago

Yes, I mentioned Kagi because of the Teclis search index is hosted by them.

However, most of the search results in Kagi are aggregated from dedicated search engines. (such as, but not limited to: Yandex, Brave, Google, Bing, etc.)

report

reply

[ - ]

phanto@lemmy.ca

2 points

3 months ago

I tried running yacy for a while but it just ran for a bit less than a day then ran out of memory and crashed, over and over. Tried to figure out the problem, but it’s niche enough that I couldn’t get anywhere googling the issue.

report

zutto@lemmy.fedi.zutto.fi

reply

[ - ]

2 points

3 months ago

This is a bit off-topic, but did you try to increase the JVM limits inside Yacy’s administration panel?

Spoilering to hide wall of text related to this topic.

This setting located in /Performance_p.html-page for example gives the java runtime more memory. Same page also has other settings related to ram, such as setting how much memory Yacy must leave unused for the system. (These settings exist so people who run Yacy on their personal machines can have guaranteed resources for more important stuff)

Other things that would reduce memory usage is to limit the concurrency of the crawler for example. There’s quite a lot of tunable settings that can affect memory usage. Would recommend trying to hit up one of the Yacy forums is also good place to ask questions. The Matrix channel (and IRC) are a bit dead, but there are couple of people including myself there!

Also, theres new docs written by the community, they might help as well! https://yacy.net/docs/ https://yacy.net/operation/performance/

report

reply

[ - ]

phanto@lemmy.ca

2 points

3 months ago

Yeah, I did try that. Basically, if I doubled the memory I allocated, I gave it half again longer before it crashed, but it still crashed, eventually.

It’s no big deal, this was last year, I may try again one day. Loving Searxng though!

report

reply

[ - ]

Sl00k@programming.dev

2 points

3 months ago

Personally really been enjoying Kagi for the past year.

report

Liam Mayfair@lemmy.sdf.org

reply

[ - ]

34 points

3 months ago

Try this engine

https://search.marginalia.nu/

Or a SearXNG instance

https://search.disroot.org/search

You may also be interested in the Indie Web movement. This site is a great resource for it, with yet more links to indie sites and blogs.

Finally, not quite what you asked but here’s a freebie, in case you didn’t know about it:

https://wiby.me/

It’s an old web search engine. It only indexes pages from the 00s and earlier.

report

reply

[ - ]

houseofleft@slrpnk.net

6 points

3 months ago

Ah Marginalia is absolutely awesome! I feel like modern search is almost an extension of website names now, so if I want to find netflix but don’t know it’s website, I might search for “netflix”. Marginalia is actually a cool way to find new stuff- like you can search “bike maintenance” and find cool blog posts about that topic.

I honestly can’t remember if that’s something google and the like used to do, but doesn’t now, or if they never did. Either way, I love it!

report

reply

[ - ]

Omniraptor@lemm.ee

2 points

3 months ago

*

This is how Google started out, until like 2010-2015 it was wonderful. I think it’s just losing the seo slop arms race now tbh

report

Daemon Silverstein@thelemmy.club

reply

[ - ]

5 points

3 months ago

*

Aside from SearXNG, I didn’t know about these search engines until your recommendation. Thanks to Wiby and Marginalia, I found old rich content (old BBS list conversations, for example) that I was looking for, regarding studies on the occult and esotericism. Thank you so much!

report

𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍@midwest.social

reply

[ - ]

28 points

3 months ago

This is a great question, in that it made me wonder why the Fediverse hasn’t come up with a distributed search engine yet. I can see the general shape of a system, and it’d require some novel solutions to keep it scalable while still allowing reasonably complex queries. The biggest problems with search engines is that they’re all scanning the entire internet and generating a huge percent of all internet traffic; they’re all creating their own indexes, which is computationally expensive; their indexes are huge, which is space-expensive; and quality query results require a fair amount of computing resources.

A distributed search engine, with something like a DHT for the index, with partitioning and replication, and a moderation system to control bad actors and trojan nodes. DDG and SearX are sort of front ends for a system like this, except that they just hand off the queries to one (or two) of the big monolithic engines.

ColinHayhurst@lemmy.world

report

reply

[ - ]

8 points

3 months ago

*

We’d love to build a distributed search engine, but it would be too slow I think. When you send us a query we go and search 8 billion+ pages, and bring back the top 10, 20…up to 1,000 results. For a good service we need to do that in 200ms, and thus one needs to centralise the index. It took years, several iterations and our carefully designed algos & architecture to make something so fast. No doubt Google, Bing, Yandex & Baidu went through similar hoops. Maybe, I’m wrong and/or someone can make it work with our API.

report

reply

[ - ]

invertedspear@lemm.ee

9 points

3 months ago

I think 200ms is an expectation of big tech. I know people have very little patience these days, but if you provided better quality searches in 5 seconds people would probably prefer that over a .2 second response of the crap we’re currently getting from the big guys. Even better if you can make the wait a little fun with some animations, public domain art, or quotes to read while waiting.

report

reply

[ - ]

bitjunkie@lemmy.world

2 points

3 months ago

if you provided better quality searches in 5 seconds people would probably prefer that over a .2 second response of the crap we’re currently getting from the big guys

This is precisely what made me switch to ChatGPT as my primary “search engine”. Even DDG is fucking useless these days if you need anything more complex than a list of popular sites that contain a couple of keywords.

report

𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍@midwest.social

reply

[ - ]

1 point

2 months ago

I’m designing off the top of my head, but I think you could do it with a DHT, or even just steal some distributed ledger algorithm from a blockchain. Or, you develop a distributed skip tree – but you’re right, any sort of distributed query is going to have a possibly unacceptable latency. So you might – like Bitcoin – distributed the index itself to participants (which could be large), but federate the indexing operation s.t. rather than a dozen different search engine crawlers hitting each web site, you’d have one or two crawlers per site feeding the shared index.

Distributed search engines have existed for over a decade. Several solutions for distributed Lucene clusters exist (SOLR, katta, ElasticSearch, O2) and while they’re mostly designed to be run in a LAN where the latencies between nodes is small, I don’t think it’s impossible to imagine a fairly low-latency distributed, replicated index where the nodes have a small subset of peer nodes which, together, encompass the entire index. No instance has the same set of peer nodes, but the combined index is eventually consistent.

Again, I’m thinking more about federating and distributing the index-building, to reduce web sites being hammered by search engines which constitute 80% of their traffic. Federating and distributing the query mechanism is a harder problem, but there’s a lot of existing R&D in this area, and technologies that could be borrowed from other domains (the aforementioned DHT and distributed ledger algorithms).

report

reply

[ - ]

obbeel@lemmy.eco.br

6 points

3 months ago

I thought Gigablast was a one-man company? Yet it had good search results and it was expansive.

report

ColinHayhurst@lemmy.world

reply

[ - ]

6 points

3 months ago

Yes, it was. Matt Wells closed it down just over one year ago.

report

BelatedPeacock@lemmy.world

reply

[ - ]

5 points

3 months ago

*

YaCy is probably what you’re looking for

report

𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍@midwest.social

reply

[ - ]

4 points

3 months ago

Yah, it does. I’ve come across it before, but it rode in on a wave of alternative search engines and got lost in the shuffle.

Thanks.

report

reply

[ - ]

troed@fedia.io

22 points

3 months ago

You’re looking for Kagi.com

Not only does it give better search results quality wise on “the big web” - you can select to search specific parts, like blogs.

Best part - it’s completely ad and spam free. You pay for it with actual money instead of with your data.

report

reply

[ - ]

18 points

3 months ago

*

Why not run an SearXNG instance and help everyone instead? Y’know, Kagi is pretty expensive and they are also getting into AI shit.

report

reply

[ - ]

troed@fedia.io

11 points

3 months ago

I’m hoping just as Proton do good free stuff using money I pay them (Visionary account) Kagi does/will do the same. The Internet as a whole needs to stop being ad-supported.

report

reply

[ - ]

4 points

3 months ago

*

I refuse to believe Proton when they do advertisements lol. They also are being pretty suspicious with ignoring XMR support since years of people requesting it. If they ever even considered it a bit, their new shit Proton Wallet wouldn’t allow you to store (or only store) bitcoin, which we all know has nothing that protects your privacy.

report

reply

[ - ]

1 point

3 months ago

The Internet as a whole needs to stop being ad-supported.

I’m with you to an extent but it also makes me consider what my online experience would have been if I needed money to do anything online. The internet was a huge part of my childhood and I definitely didn’t have money to spend on it.

We barely had enough to get internet when I was ~10yrs old and it was much later when we got something better than dial up.

report

reply

[ - ]

4 points

3 months ago

I’ve signed up for the €5 a month subscription at kagi and I’ve never used my whole quota.

Granted I expect it’s overly expensive if you live in a developing country like Eritrea or the United States

report

reply

[ - ]

2 points

3 months ago

5 euros a month for 300 searches. Definitely not worth it. I live in germany.

report

reply

[ - ]

3 points

3 months ago

Can you expand on how running your own SearXNG helps others? Does it contribute to some shared index or something?

report

reply

[ - ]