In my last post on Google Print—since rechristened Google Book Search—I raised a particular law and economics concern about the project (see Fair Use and Inefficient Bundling). That post generated a flurry of comments—thanks!—and now I want to try a different approach to assessing the project: the world of the law school hypothetical. Try two different versions of what Google Book Search is doing to evaluate the actual project.
Google Index: In truth, for most books, the index at the back of the book leaves much to be desired—academic speak for the average index stinks. In the rush to the finish the book, the author or the publisher slaps a list of terms and page numbers together, and voila, the index is done. Larry and Sergey know this, so they announce the Google Index project. Google will scour the world for the best indexers, promise them as much free chicken-apple sausage as they can eat, and give them each a stack of books. Read the book, create an index, and put the index online. (We could also imagine a wiki-index project if you prefer less centralization.)
Google Index would be searchable using the standard Google search structure and could be advertising supported. But here is how Google Index would differ from Google Book Search. Google Book Search returns one of three results depending on the copyright access Google has to the work. In some cases, you get the whole book, and your search takes you to the relevant page. In other cases, you can only move back and forward a couple of pages from the found page. And in the narrowest result, what Google calls the “snippet view,” you see only fragments of text—the search term in a limited text context—and the page numbers associated with that text. That view includes a link addressing the missing text. Says Google:
We respect copyright law and the tremendous creative effort authors put into their work. So, unless any given book’s publisher has given us permission to show sample pages, you'll only be able to see the Snippet View which, like a card catalog, shows information about the book plus a few snippets—a few sentences of your search term in context. If the book isn’t under copyright at all, you can browse the entire book in the Full Book View, but the aim of Google Book Search is to help you discover books and learn where to buy or borrow them, not read them from start to finish. It’s like going to a bookstore and browsing—with a Google twist.
The hypothetical Google Index would take one step back from the snippet view. It would return just the basic info on the book—author, title, ISBN and perhaps a link to Amazon to buy the book—and the page number relevant to the search, just like a paper index.
Google Digital Index. Version 2 of the hypo. Enough with breakfast says Google. Instead, of human indexers, Google takes physical copies of the books, digitizes them, sics high-end software on the digital copies, and produces an index for the books. Google destroys the digital copies, returns the physical books to the libraries, and opens for business. Again, a search on Google Digital Index generates only author/title/ISBN info and the page number in the physical book relevant to the search.
Where does this put Google? Is the index a derivative work? Does the presence of interim copies in the second version matter (I think Bill Patry says no)? Does the fair use analysis change if page numbers are returned as a search result rather than limited amounts of actual text plus page numbers as in the snippet view?