November 9, 2007 By Reid Goldsborough
Photo: Amsterdam Public Library
Pundits have described the Internet as the greatest boon to literacy since Johannes Gutenberg invented the printing press in the fifteenth century.
Despite the Internet's multimedia versatility, communication over the Internet remains largely words typed on a keyboard and read on a screen.
Some have even predicted that the Internet will do away with conventional printing just as paper replaced papyrus, clay, and lambskin. This may eventually happen, but the paper industry shows no signs of going away any time soon, and the paperless office remains a futuristic fantasy.
Still, the Internet has complemented many traditional print media. Books, however, have been a technological laggard. Lately, significant inroads have been made in the areas of printing books, buying books, reading books, and perhaps most interestingly doing research with books.
Major players are involved, including Google, Microsoft and Yahoo, as well as some of the top libraries in the world. Google, the Internet's most popular search engine, is getting the most attention and creating the most controversy.
Google Book Search, formerly Google Print, lets you search for free through books just like Google lets you search through the Web, with Google earning profits through advertising. In cooperation with university and public libraries as well as book publishers, Google is digitizing both out-of-copyright books and more recent books still subject to copyright protection.
On balance, giving people quick access to book knowledge is a good thing. The ultimate goal is the same as envisioned by the builders of the great Library at Alexandria, completed by the Macedonian rulers of Egypt around 300 BC: Archiving the world's knowledge in printed form.
Google has been as aggressive as these ancient archivists, employing thousands of workers around the world to scan books to create its own universal library. It has also been aggressive in how it interprets the fair use aspect of the copyright law, including books in its repository unless notified by the copyright holder not to. Both moves have led to the controversy.
The Authors Guild and the Association of American Publishers separately sued Google for copyright infringement, contending that Google Book Search will hurt authors.
But you can see only a very limited amount of any book still in copyright. Google contends that the current book component of its service is more a book marketing program rather than an online library. Depending on the permissions given by the copyright holder, a viewer is typically able to view either snippets of text or a small number of pages surrounding the search term. Google also gives copyright holders the option of removing a book from Google Book Search.
The way Google scans books has also been criticized. Google won't disclose its techniques, but reports indicate that it uses at least in part a robotic technique without a human being checking the results, which causes some pages to be unreadable, some to be scanned more than once, some to be in the wrong place, and some to be cut off.
Much of this scanning takes place abroad. It's significantly less expensive to scan a book in China than Des Moines. But this leads to the descriptive data associated with any book -- including its title, author, date of publication and category -- to be wrong more often than it should be, making the archive less useful.
Google Book Search has been operational since late 2004, though Google still indicates it's in the beta, or testing, stage.
Google isn't the only guy in town trying to create a universal library. Microsoft is engaged in a similar effort associated with its Live Search service called Live Search Books.