For many computer novices, this frustrating scenario is what it’s like to search for information on the Web. Today, the primary tool for finding material is the search engine, which is simply a software program that crawls through the Web and makes an inventory of what’s out there. The Web now contains some 1.2 billion pages and roughly doubles in size each year. Without search engines, we’d never be able to find all the exciting new stuff that’s added to the Net every day, about 38 pages per second.
Everyone agrees that search engines are necessary, but one study has shown that seven of 10 Internet users are dissatisfied with them. Who hasn’t done what he or she thought was a fairly straighforward keyword search, only to be deluged with hundreds of results, all of them useless? The reasons for the bad mojo are complicated. The Web is enormous; there’s incredibly useful information to be found but, conversely, a ton of junk to be avoided. People aren’t very good at using search engines. And, worse, computers aren’t very good at figuring out what people really mean when they type in “book + car + Ford.” Companies have been slogging away at the problems for years, and the hard work is starting to bear fruit. Right about now, Web surfers everywhere should start to notice a difference: searching is smarter than ever before.
There are two basic approaches to searching: man or machine. The machine approach holds that technology can perform a crushing volume of work that no human being could accomplish in the same amount of time. Machines never sleep and don’t take breaks for lunch. Companies like AltaVista, Inktomi, Excite and Northern Light are the heavy-hitters of massive number-crunching. A new search engine called All the Web (graphic) has taken the position that size really does matter. “There are gems hidden out there that would have appeared on the first or second page of results if the search engine had only known about them,” says David Burns of FAST Search, the Norwegian company behind All the Web, currently the largest index at 200 million pages.
The other strategy has been to harness the best supercomputer ever invented: the human brain. The theory here is that no piece of technology can beat the ability of a regular person to select the highest-quality Web pages and then discard the rest. Less is more. This is why Yahoo! employs some 150 editors and Web surfers to create what it calls a “directory,” which categorizes a total of 1.2 million links to Web pages in ever-narrowing hierarchies of specificity. You get fewer results from a Yahoo search, but you’re more likely to find at least some value in all of them. Others companies have followed Yahoo’s pioneering instincts: LookSmart (www.looksmart.com), Ask Jeeves (ask.com) and Netscape’s Open Directory (dmoz.org) all use human beings in more or less the same way.
Of course, most of these companies are trying to combine technology and human intelligence in ways that outstrip the ability of either approach alone to solve particularly difficult search problems. One of the most stubborn for all search companies has been predicting what people really mean when they type in cryptic search terms like “chips” or “saturn.” The planet? The car? The Sega videogame console? Something no one’s heard of before? In other words, how do you get inside the head of your user? As one search-engine executive wryly notes, “The ESP module has not yet been invented.”
Electrodes to the temples may not be a crazy idea, considering how difficult search can be. Less than 6 percent of surfers manage to use Boolean search terms, which are the “and” and “or,” and plus and minus signs that correspond to the way a computer filters data. Microsoft research found that the most frequently occurring search query was a fragment of a URL, which indicated to their analysts that people couldn’t tell a search engine from a browser. “Sometimes they were using a search engine to search for the very same search engine,” says Microsoft’s Yusuf Mehdi.
Ask Jeeves takes an interesting approach to the random searching issue: it tells users to pose their query in the form of a natural-language question. The staffers at Ask Jeeves then generate a list of related questions to help you locate a satisfactory answer to your question among the Web pages they found. For example, the query “Will the Cleveland Indians win the World Series?” resulted in five related questions, including “Where can I find expert predictions on the 1999 American League?” Clicking on this follow-up question is equivalent to saying, “Yes, that’s what I’d like to know.” The result? CBS Sportsline’s American League Predictions page, on which two baseball aficionados vote for the Indians as this year’s world champions. Similarly, Simpli.com, a new search engine that won’t launch until November, gives you pull-down menus to zero in on your meaning. If you typed in “java,” it would ask you to specify whether you meant the programming language, the Indonesian island, coffee beans or “other.”
Another conundrum for the industry is how to bring quality Web pages to surfers without sacrificing breadth of coverage. One good solution is the “popularity” engine developed by a Wellesley, Mass.-based startup called Direct Hit. Its logic is ingeniously simple: popularity is a very good indicator of quality. “As millions of surfers interact with a search engine, we anonymously monitor what sites they are going to, and how long they spend there,” explains cofounder and chairman Gary Culliss. “So the database gets smarter as we go along, and the good stuff floats to the top.”
A new engine called Google judges the quality of a Web page by how many other Web pages link to it; the presumption is that people don’t link to bad Web pages. MSN is improving its search by having “hundreds” of employees spot search trends that bubble up in the query logs. Now that it’s football season, says Microsoft’s Mehdi, “a search for ‘bears’ will probably mean the Chicago Bears, and this week people might be more interested in buying tickets than finding general information.” The database is tweaked daily to reflect these changes in the Zeitgeist.
Which search engine is “best”? It’s impossible to crown one winner. But there are some general rules of thumb: big search engines like AltaVista and All the Web are best for obscure searches because casting a wider net increases your likelihood of finding anything at all. Novices should go to the big portals like Excite, Yahoo and Lycos, all of which target mainstream users and are thus optimized for ease of use. Danny Sullivan, who edits an excellent guide to Web search at SearchEngineWatch.com, encourages surfers to try them all. “Relevancy,” he says, “is a hard thing to lock down. One person may like a search engine; another may hate it.” With these new engines, there’s a lot more to like.