« Women’s Bodies: Violence, Security, Capabilities - Part I | Main | Foreign Law and Constitutional Interpretation »

November 18, 2005


Feed You can follow this conversation by subscribing to the comment feed for this post.

Tim Lee

This is a great thought experiment! I think your two index examples would both pretty clearly be fair use. The courts have upheld the creation of "intermediate copies" (in Kelly v. Arriba Soft and Sega v. Accolade, for example), in cases where the creation of the copies was an essential step toward a use that is ultimately fair. This was a recognition, I think, that making copies is what computers do. If you apply a simplistic "no copying" rule to the digital world, you end up prohibiting a lot of otherwise fair uses based solely on the fact that they create intermediate copies as an incidental part of the operation of the software. I don't think that makes a lot of sense.

So I think most people would agree that the "digital index" version--in which the intermediate copy is discarded--is fair use. The more difficult question is whether it's a fair use if that copy is kept, not distributed, but used to create "snippets." I think the answer ought to be yes, for roughly the same reasons as I mentioned above: the copies are still "intermediate" in the sense that they are never made available in full to human beings, and the use to which they are being put (displaying snippets) is likely to be a fair use itself.

Here's something else to consider: Depending on the internal format Google uses, there might not be that much practical difference between an "index" and a "digital copy." If Google Print's Index included every word in the book with a list of every spot in the book where that word appeared, you might be able to re-construct the book from the index. (after all, a really good digital index should be able to search for words adjacent to each other, and that's impossible if the index merely contains words and page numbers.) What's the difference between a comprehensive index (which could be used to re-construct and display the book if you had the right software), versus a "digital copy" (which human beings can't see without the right software).

So it's not clear there's even a clear distinction to be draw between a "digital copy" and an "index." It would probably be a bad idea for fair use determinations to hinge on the precise format of the index.

Joe Miller

Provocative hypotheticals!
I think the argument for treating your Google Index entries as derivative works of the books indexed is weak. I suppose the argument would be something like, "The Index material includes creative expression (key words and phrases) copied from the indexed work. The fact that the copied creative expression is rearranged as an index is of no moment." The problem here is that the typical index entry is usually just a word or two (or three) long. Can the book author claim a protectable interest in creative expression that is two or three words long? In other contexts (e.g., cases about book titles or advertising phrases), courts have rejected such copyrightability claims. Perhaps the book author is on stronger ground against your Google Index because so many bits of expression are copied cumulatively in the Index (after all, the more comprehensive, the better the index).
Tim Lee raises, in a sense, a turbocharged version of that last point: If your Google Digital Index were powerful (e.g., allowing searches for phrases, even long phrases, and not just words), couldn't one reconstruct the book from the index? In the limit, isn't a digital index just the book in a different form? I suppose it matters whether, given what's in the (hypothetical) database, I can get more than merely a list of the pages on which a given word or short phrase occurs. Moreover, even if I could get a list of all the words that appear on a given page, it seems I wouldn't yet have the book author's creative expression (because I don't know the order in which the author put them).

D Conrad

The talk about indexing is interesting, in that it highlights confusion over the basic idea of copyright. Some people think of copyright as a control right to prevent people from using a work at all (Bridgeport). Others think copyright is a market-protection scheme, and there is yet another view that copyright protects instantiations of the identifiable elements of a work.

I think the last view is the most accurate; indexes aren't taking the purpose of the work, and therefore don't seem bad.

As far as each bit of a work being part of the work, that is true, but the parts aren't equal. If I tell you about a book I read involving young people who discover they are wizards and have to fight off evil without letting the normal people know they exist, you'll probably think of Harry Potter. But that describes several books -- those elements are not protectable. It's tough to draw a line as to what is and isn't protected, but an index seems to fall short.
I like your example of taking all the words and scrambling the order; this demonstrates the ideas of important elements of a work, and that the protected content is something more than the sum of its parts.

Finally, I don't see that the ability to copy/obtain a work (by conglomerating Google book searches) makes a difference. It's possible to copy a rented DVD, but Blockbuster isn't culpable for copyright violation if I make a copy. Besides, the barrier to getting a book in digital form is fairly low, it just takes one person to spread copies -- this happens within a day or two for books like Harry Potter.

Cory Hojka

My understanding of the Google Digital Index as presented is that Google would scan the entirety of the whole book, run the algorithms, and then ditch the digital copy. As a result, I think there is a problem with the second hypo that is not present in the first.

This problem is, essentially, why should Google be allowed to create complete digital copies in order to create indexes? After all, do human authors require the full text to create an index? I do not believe so, because an index entry generally refers to only a limited portion of text. Thus, this copying in the entirety could pose problems in fair use, as the portion copied is larger than is necessary to the task.

While someone might think Kelly v. Arriba Soft is applicable to this question, this view would be in error, as the technologies and processes in Kelly and this hypo are considerably different. Kelly was about digital images, which due to compression algorithms can only be observed and shrunk once the whole file is copied and interpreted. In comparison, there is no such compression algorithm to contend with when examining books, nor does anyone likely require more than a handful of paragraphs or pages at a time to create an index.

More importantly, while Google Digital Index hypothetical claims to destroy the copies, let’s be honest, complete digital copies of books are pretty tempting things to keep around. Even if Google (or its competitors) is effective in eliminating the digital copies on its servers, there is always a possibility that an employee (or an employee of a competing service) will share a copy with a friend, who shares it with another friend, who does the same and so on until everyone’s friend on the Internet has a copy. Therefore, this hypo poses exactly the same problem as Lichtman pointed out with Google Book Search, which is that the process involved poses a considerable risk of digital piracy toward protected works.

Equally important to our consideration is that, in creating an index from a book, there is likely no fair use benefit from creating complete digital copies of books when smaller portions will suffice. Regardless of its legality, Google Book Search requires that Google retains complete digital copies, but not such need likely exists for Google Digital Index. As a result, fair use should not protect an entirely unnecessary process that creates such considerable risks to copyright holders, irrespective of whether the end result of the process is not infringing.

The comments to this entry are closed.