Last week, the Author's Guild filed suit against Google, arguing that Google's new "Google Print" project infringes copyright. The blogosphere has been abuzz on the topic, with (among others) Jack Balkin, Tim O'Reilly, and Larry Lessig weighing in strongly on Google's side. But I'm not so sure, for reasons I want to develop a bit here.
For those of you who haven't followed this one, the question is whether Google can build a searchable index that allows queries against a database of print books that Google itself scans in. Google has already begun the scanning, and Google's basic position is that it can scan any book it wants, even without the copyright holder's permission. In legal terms, Google's contention is that it is fair use to scan a book without permission if the purpose of the scan is to then create an a search engine that will return only snippets of the work.
I have several concerns with such an argument.
First, even if Google's specific use does not harm copyright holders -- and I doubt that, and will say more on that point below -- I worry about security issues. Think about it. Google is about to build a giant database that will have electronic copies of millions of books. Isn't that a huge target for some hacker who will want to then put those books online? If the books get out, that would obviously hurt sales for the relevant authors. Thus, at a mimimum, shouldn't any fair use defense be contingent on some sort of obligation to maintain reasonable security?
Second, even if the books are secure, shouldn't authors have a right to some of the value created here? Yes, yes, I know -- copyrighted works create lots of value, and only some of it goes to the relevant copyright holders. But Google's fans all argue that online database access is going to be the main way books are used in the future. If that's right, and I suspect it is, shouldn't we be developing a legal system that allows authors to share in that revenue stream? (Are there types of books for which this argument is better than others? For instance, I take it that very few copies of Bartlett's Quotations would be sold if "snippets" were available for search on Google. What other books fall into that category, namely books that would lose their entire value were they Google-ized?)
Third, and cutting both ways, transaction costs strike me as the central theme for debate here. Why can't Google ask for permission with respect to books for which the copyight holder is reasonably identifiable? Yahoo asks for permission. So do radio stations. It strikes me that the costs of asking are usually pretty modest as compared to the costs of scanning a full book. (With respect to works where the copyright holder is not reasonably identifiable, and hence those costs are not modest at all, should we have a special fair use defense, just as a way of helping new-technology projects like this take off?)
Lastly, remember that whatever legal rule we create here is a legal rule, not a Google-specific contract. Thus, we have to make sure that any fair use rights we articulate here will work in a world that has lots of players (Google, Yahoo, new startups, and so on) plus also dishonest players that will abuse the rules here in much the same way that the Grokster and Napster folks abused the rules related to small-scale sharing of music.
Can we write a rule that solves these problems, or create a set of legal rights that make a comparable project (like Yahoo's) possible? I hope so. The idea of an online database for print books is very exciting. The above are some of the difficulties that loom, however, and I would welcome some ideas in how to address them.
Your security concerns are overblown. If someone cracks Google's servers and manages to download some books and then is dumb enough to start distributing them, copyright holders have all the benefits of their monopoly rights to assert against these distributers. This is clearly analogous to file-sharing in other media where there is _no_ security proctecting the content. I have yet to see a cogent and persuasive argument that music and movie sharing have reduced CD or ticket sales.
Second, orphaned works: If Google starts contacting copyright holders (I almost wrote "authors" there) for permission, that's an implicit admission that the use in question is not fair, and not permissible without permission, which opens Google to liability when the heirs or assigns of some orphaned work wake up to the fact that their copyright protected content is on Google's servers and sue for statutory damages and an injunction. By claiming all such uses are fair, Google avoids the question by facing the larger one, and, through current proceedings, has the opportunity to settle it before their investment is too large.
I see it as being like submarine patents. A stealthy copyright holder could sit on their rights for a while and delay their claims while waiting for Google Print to grow in popularity only to have that success to hold over Google's head at suit in order to extort a large payout.
Of course, if the pending copyright reform legislation with respect to orphaned works is ever passed, then my latter argument is moot. I'd still think that Google's use is fair, but the argument about submarine copyrights would go out the window.
Posted by: bill | October 04, 2005 at 12:12 PM
Bill, in saying the security concerns are "overblown," it sounds like what you have in mind is a situation where a person accesses the database without authorization, the work is copied, and then *only that person* further distributes the work. In such a situation, it may be feasible to track down that distributor and sue him/her. But that's 1950s-era infringement; we live in a different world. Now the threat is, the digitized works are accessed, and they are very quickly made available everywhere, for all time, for free. The harm to the copyright owner is much more intractable in the second scenario, and therefore the security needs to be that much stronger. It's like the difference in the protocols you would use to protect a garden-variety flu virus from escaping the lab, versus smallpox.
Second, as Doug mentions, the rule developed has to apply to all cases, not just Google. Google's service may be pretty secure (although it's not just the server security; it's also how much of the work can be obtained in ways Google does not intend). But what if I want to establish the same service on my own, completely unsecured server? Is that still fair use? It's arguably not within the fourth fair use factor, because it's not my use that cuts into the market for the existing work (aside from the clip-book example Doug notes above), but the threat of what others who access my server without my permission may do. (I'd link to 17 U.S.C. sec. 107(4) but this blog doesn't allow links for some reason.)
In the current environment, I don't think anyone is holding "submarine copyrights." I believe the vast majority of copyright owners are much more worried that if they don't defend their copyrights now, they will be drastically undermined or even cease to exist in the future. Also, there is a three-year statute of limitations.
Posted by: Bruce | October 04, 2005 at 01:50 PM
I also think the security concerns are overblown.
The vast majority of books languish in obscurity, in some dusty library.
Perhaps those small fraction of books that are indeed likely to be sold in large quantities can be held back from the main resource?
Take a look at Tim O'Reilly's article in the Times: http://www.nytimes.com/2005/09/28/opinion/28oreilly.html
Posted by: Edward | October 04, 2005 at 03:53 PM
Bruce, I certainly did _not_ have that situation in mind, and I understand the dynamics of file-sharing quite well. Digitization and networking changes everything. I understand that the genie can't be put back in the bottle. Setting aside the argument about orphaned works (i.e. considering only actively commercially exploited works), I see the problem a bit differently:
Mass digitization is coming. Many books are now available as eBooks and traditional, dead-tree books, and more and more will be available as eBooks in the future until someday when every book is released in printed and electronic versions. Every eBook you can imagine is already available on the Darknet from the day of its release. The latest "Harry Potter" was available for download within hours of its release (transcribed from the print version, nonetheless). Popular material will be digitized (legitimately or no) for distribution online. If Google's servers are cracked, that person can and will be caught, prosecuted for cracking, and sued for copyright infringement (if they subsequently distribute), but to think that the material will not already be available in digital form is naive since odds are it will be (increasingly so as the years go by). So to say that there is a security concern with Google's activities is to ignore the elephant in the room.
It occurs to me, however, that one could quite reasonably argue that in its fair use of copyright protected material, Google (and others that imitate them) may have a duty to reasonably protect said material from unauthorized distribution. That is, if an indexer fails to take reasonable precautions against or through negligence allows wholesale release of the materials, they may incur some liability. That, in and of itself, doesn't make the use not fair.
On the submarine copyright issues:
1) three years is an eternity in Internet-time.
2) does the clock start ticking when the rightsholder becomes aware of the infringement, when the infringement first occurs, or when it last occurs?
I can see a number of submarine scenarios depending on the answers to those questions. Two years, 360 days is plenty of time for Google Print to become popluar enough that Google would rather settle an infringement suit than abandon the project or face a judgement with full statutory damages. If the clock begins ticking with the last infringement, it would be quite easy to argue that infringement was continuous up until the time of the filing of the suit thereby allowing indefinite delay.
How long was "Happy Birthday" in print before the copyright infringement cases? How long before ASCAP went after the Girl Scouts? Submarine copyright cases do happen.
As to fair use "factors," I recommend to you the same post of Bill Party's that I recommended to Doug back on Prawfsblog: http://williampatry.blogspot.com/2005/09/google-revisited.html
Posted by: bill | October 04, 2005 at 06:41 PM
There are a couple of problems here. First off, Bartlett's Quotes isn't a good example of books that are harmed. It's a sweat-of-the-brow collection -- and Google undoubtedly has organized the facts (quotes) in a different manner than the book.
Secondly, the security concerns might be valid, but as Bill pointed out above, the digital availability of works is becoming standard. More importantly, I think that the authors might have a negligence action should Google fail to take adequate security measures, but I don't think this should be a factor of fair use. It's a seperate issue.
Finally, the idea that fair use is an economic device to maximize efficiency when transaction costs are high seems wrong to me. I think fair use is more a public good issue, as a limitation on the grudgingly authorized monopoly of copyright. It's not "Use it if it is too hard to contact the author", it is "You have the right to use the work for certain purposes regardless of the author's wish".
My blog covers it a little better (I started the post on Sunday about your prawfsblog post, but life got in the way).
Posted by: Derek Conrad | October 04, 2005 at 08:45 PM
Even were Bartlett's to be a book that could be seriously harmed, the copyright holder can simply opt to have that book excluded from Google.
Posted by: Jason | October 04, 2005 at 09:49 PM
> Isn't that a huge target for some hacker who will want to then put those books online?
Most likely not. Search indexes tend not to have full-copy texts available. I imagine that you imagine that there is some huge library of full-copy digitized texts in the background sitting on Internet-connected servers. That need not be the case, and is a poor assumption to make. The details are technical and we don't know them all (there are various implementation possibilities), but the details cannot be ignored.
Posted by: Travis | October 06, 2005 at 12:34 AM