Any “small-web” search engines?

posted 1 month ago

I use Duckduckgo, but I realised these big(ish) search engines give me all the commercialised results. Duckduckgo has been going down the slope for years, but not at such a rate as Google or Bing has.

I want to have a search engine that gives me all the small blogs and personal sites.

Does something like this exist?

Sort:

Hot Top Controversial New Old

You are viewing a single thread.

View all comments

[ - ]

𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍@midwest.social

28 points

1 month ago

This is a great question, in that it made me wonder why the Fediverse hasn’t come up with a distributed search engine yet. I can see the general shape of a system, and it’d require some novel solutions to keep it scalable while still allowing reasonably complex queries. The biggest problems with search engines is that they’re all scanning the entire internet and generating a huge percent of all internet traffic; they’re all creating their own indexes, which is computationally expensive; their indexes are huge, which is space-expensive; and quality query results require a fair amount of computing resources.

A distributed search engine, with something like a DHT for the index, with partitioning and replication, and a moderation system to control bad actors and trojan nodes. DDG and SearX are sort of front ends for a system like this, except that they just hand off the queries to one (or two) of the big monolithic engines.

permalink

report

[ - ]

ColinHayhurst@lemmy.world

8 points

1 month ago

We’d love to build a distributed search engine, but it would be too slow I think. When you send us a query we go and search 8 billion+ pages, and bring back the top 10, 20…up to 1,000 results. For a good service we need to do that in 200ms, and thus one needs to centralise the index. It took years, several iterations and our carefully designed algos & architecture to make something so fast. No doubt Google, Bing, Yandex & Baidu went through similar hoops. Maybe, I’m wrong and/or someone can make it work with our API.

permalink

report

parent

[ - ]

invertedspear@lemm.ee

9 points

1 month ago

I think 200ms is an expectation of big tech. I know people have very little patience these days, but if you provided better quality searches in 5 seconds people would probably prefer that over a .2 second response of the crap we’re currently getting from the big guys. Even better if you can make the wait a little fun with some animations, public domain art, or quotes to read while waiting.

permalink

report

parent

[ - ]

bitjunkie@lemmy.world

2 points

1 month ago

if you provided better quality searches in 5 seconds people would probably prefer that over a .2 second response of the crap we’re currently getting from the big guys

This is precisely what made me switch to ChatGPT as my primary “search engine”. Even DDG is fucking useless these days if you need anything more complex than a list of popular sites that contain a couple of keywords.

permalink

report

parent

[ - ]

𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍@midwest.social

1 point

16 days ago

I’m designing off the top of my head, but I think you could do it with a DHT, or even just steal some distributed ledger algorithm from a blockchain. Or, you develop a distributed skip tree – but you’re right, any sort of distributed query is going to have a possibly unacceptable latency. So you might – like Bitcoin – distributed the index itself to participants (which could be large), but federate the indexing operation s.t. rather than a dozen different search engine crawlers hitting each web site, you’d have one or two crawlers per site feeding the shared index.

Distributed search engines have existed for over a decade. Several solutions for distributed Lucene clusters exist (SOLR, katta, ElasticSearch, O2) and while they’re mostly designed to be run in a LAN where the latencies between nodes is small, I don’t think it’s impossible to imagine a fairly low-latency distributed, replicated index where the nodes have a small subset of peer nodes which, together, encompass the entire index. No instance has the same set of peer nodes, but the combined index is eventually consistent.

Again, I’m thinking more about federating and distributing the index-building, to reduce web sites being hammered by search engines which constitute 80% of their traffic. Federating and distributing the query mechanism is a harder problem, but there’s a lot of existing R&D in this area, and technologies that could be borrowed from other domains (the aforementioned DHT and distributed ledger algorithms).

permalink

report

parent

[ - ]

obbeel@lemmy.eco.br

6 points

1 month ago

I thought Gigablast was a one-man company? Yet it had good search results and it was expansive.

permalink

report

parent

[ - ]

ColinHayhurst@lemmy.world

6 points

1 month ago

Yes, it was. Matt Wells closed it down just over one year ago.

permalink

report

parent

[ - ]

BelatedPeacock@lemmy.world

5 points

1 month ago

YaCy is probably what you’re looking for

permalink

report

parent

[ - ]

𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍@midwest.social

4 points

1 month ago

Yah, it does. I’ve come across it before, but it rode in on a wave of alternative search engines and got lost in the shuffle.

Thanks.

permalink

report

parent

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

Community stats

18K
Monthly active users
11K
Posts
518K
Comments

Our Rules

Approved Bots

Community stats

Community moderators