« Book Couture: Yale Library Treasures For Fashion Week | Main | Copyright in the Digital Age »

What's in this box?

We take them for granted now, those ubiquitous little rectangles. They've become so common, such a part of the argot of modern life that they often don't need labels or "submit" buttons.

Search boxes, of course, are a bit more complicated than they actually seem. And as with many things, a simple search box masks a wildly complex mechanism -- at least if it works well.
search-box.jpg
Whatever its shortcomings, the most famous of search boxes, the Google Search box is certainly simple. Its simplicity seems at odds with its almost clairvoyant ability to find what you were looking for. The elegant simplicity does allow for the introduction of some complexity through the "advanced search" screen or the use of modifiers like site: or inurl:. Still, the results seem never the same twice and Google is constantly tweaking the math that drives the search results: fighting linkfarmers and overzealous search engine "optimization" and adding new data types like images, videos and geographic information.

Library search boxes, on the other hand, haven't changed much since the advent of the online catalog, so named because it was an online version of the card catalog, itself a huge leap forward from the bound catalogs that preceeded it and the scrolls that preceeded them. Libraries have not spent a great deal of time or effort improving search results and often our first inclination as librarians is to make the search box (or boxes, or entire screen) more complicated. This can lead to better results: what librarians call "greater precision" but it does require quite a bit of precision on the part of the searcher as well. More on precision another time, perhaps.

Part of what makes today's simpler search boxen work is what librarians call high recall. Which is to say they find a lot of stuff. Mountains of stuff. Things that have nothing to do with what you want at all. What makes this work is the ability of the math driving the search box to move the things that are the closest matches to the top of the result list. And that's where the magic is: in the ability of math to determine relevance from a soup of words.

So library catalogs have traditionally had high precision and (within the bounds of that precision) very good recall (typos and missing or misfiled cards could prevent recall even with perfect search terms - and you were generally limited to three subjects per book). Library catalogs have, until quite recently, lacked any sort of relevance ranking. This, it could be argued, is because librarians painstakingly select each item in the library collection, therefore what you find is what we deem best. (I can hear you bristling now, clever students and brilliant scholars all.)

In today's world this simply will not do: first, you can and should judge sources for yourself and second, we simply have too many items in too many disciplines from too many sources to suggest that you can search them all with perfect precision. Besides, perfect precision doesn't work at all with full-text sources. And you want to search full-text sources, don't you?

Since I'm well past making a long story short, I'll simply say that we do get it. Search is hard to do well, but we are trying. Take a look at Yufind, our experimental catalog -- "discovery tool" is the buzzword among the libraryland cognoscenti -- let us know what you'd like to see. We're working on it.

TrackBack

TrackBack URL for this entry:
http://www.library.yale.edu/cgi-bin/mtblog/mt-tb.cgi/151

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on October 17, 2008 5:00 PM.

The previous post in this blog was Book Couture: Yale Library Treasures For Fashion Week.

The next post in this blog is Copyright in the Digital Age.

Many more can be found on the main index page or by looking through the archives.

Send comments to Katie Bauer