I promised a post a few days ago about distributed search engines, but I\'ve been dilly-dallying about it. It\'s the holidays, we\'re all full of turkey and cookies.
In my earlier post, I fretted about how Google and other centralized search services like it had become a bottleneck to finding information online, and could therefore become a tempting target in the drive to regulate ( and even censor ) Internet content. But there is a more powerful, positive argument to make in favor of distributed search engines — people are assembling their own collections of information, in the form of websites, discussion groups, blogs, and more traditional forms of writing, but there is still no way to selectively search this content. You can go to Google and search the entire Internet, or you can use a variety of rudimentary seach tools on your own comptuer or individual public websites. What you can\'t do is say \"search the New York Times, the blogs in my blogroll, and the Wayback machine for documents similar to the email message I just sent\". A distributed system would fill that middle ground.
Right up front it\'s important to say that peer-to-peer search engines wouldn\'t be intended to replace of centralized services like Google, any more than weblogs have supplanted large news or commentary sites like Salon or the New York Times. Instead, they would serve the same purpose as weblogs do, which is to create neighborhoods for specialized information, and make it easy to find, join, and participate in niche communities of knowledge.
Mena Trott mentions a phenomenon that you can often see by monitoring your referrer logs - a post on an arcane topic will become the hub of a little universe of interest. In her case, an attached discussion became the locus for a whole little special-interest group, with visitors coming in via Google, answering one another\'s questions and keeping the post \'alive\' outside the context of the weblog itself.
A peer-to-peer search engine would make such microcommunities easier to find, and easier to sustain. Instead of relying on an Internet-wide portal like Google, you would run searches through a personal search client; this could be a Web application, or a more fully-featured desktop application, like a blog aggregator . The client would let you seek out searchable collections through a kind of meta-search, akin to the way Gnutella and other file sharing networks discover new nodes , and create \"search lists\" of interesting sites to send queries to, much like an iTunes playlist. You could also keep a list of favorite queries, which you would periodically send out to chosen blocks of search engines, to find newly added material.
Queries would go out to each little search engine, get their results through a standardized API ( most likely a web serivices protocol ), and return a ranked list of relevant hits. The engine could then recombine those into a single ranked list of hits, and allow you to do all the usual post-filtering — exact phrase matches, sorting by date, and everything else we\'re used to being able to do in a decent search engine.
The net result of this would be a search network whose topology would be just as interesting as the current network of hyperlinks, and clever people would find clever ways to combine the two to make it even easier to find and join interesting conversations.
This is truly a job for the LazyWeb - the technical hurdles are not that great, and the blogging community can be the first to benefit from a working system. Then, when Google puts up the mandatory 700-pixel portrait of John Ashcroft on its homepage and removes the search box, we\'ll at least have something to fall back on.