ALA 2006: Google Book Search

Ben Bunnell, Manager of Google Book Search and author of an upcoming Last Byte column in the July NetConnect (no link yet), described how Google cofounders Larry Page and Marissa Mayer originally conceived of the book scanning project while they were in graduate school at Stanford. Using a metronome, they estimated that a 300 page book would take 40 minutes to digitize. Though it wasn’t answered at the session, other panels mentioned that the entire University of Michigan library collection of 7 million books is slated for completion in six to seven years. Libraries will be interesting places in 2010.

Google’s intended goal is to “digitize all books,” and Bunnell said “Google is not focused on author, genre, or time period.” Lawsuits from the Author’s Guild and others have slowed progress.

There are three areas of digitization: Publisher agreements for recently published books (except for Elsevier–one panelist quipped that Google should buy Elsevier); books currently in the public domain (before 1923), and what Tim O’Reilly calls the “twilight zone” (75% of what has been published).

The easy part is scanning books in the public domain (before 1923 in the US). This includes Jane Austen, Charles Dickens, Emily Dickinson, and Shakespeare. Other digital projects have started with this, including Project Gutenberg, Early English Books Online, and the Making of America project. The public domain content makes up 20 percent of all available books.

Google already has agreements from all US major publishers, and they are getting digital copies directly for books in print, which are 5 percent of the total.

The controversy comes with books published from 1923-2000. Currently, Google is continuing to scan these books and display their contents in “snippet view” and in a selected number of pages. Searches currently show three snippets.

Following the presentation, discussion revealed that Google now has an agreement with the Library of Congress as well as the other five libraries in the Google Print project (University of Michigan, Oxford, Harvard, Stanford, and NYPL). The Find It in a Library links are live for some books that were originally scanned via libraries and Bunnell said they “are close to linking all books.” Google wants users to alert them to the copyright status of a book, so it seems reasonable to expect a contact link to show up soon. Third, the link syntax of Google Books is static, so one audience member asked if it would be possible to link from a library catalog to the online copy. This is possible, but requires that the patron has a Google Account, which raises privacy concerns.

Recommendation for Google: If you’re going to have a panel from noon to one, bring along your Googleplex chef and feed the hungry librarians.