On October 23, 2003 Amazon.com made it possible to “search inside” the full text of 120,000 books from more than 190 publishers. This allowed Amazon users to search not only the full texts of individual titles but all 120,000 collectively.
On October 23, 2003 journalist Gary Wolf published an article about the cultural history of digital libraries, and more specifically Amazon's "Search Inside," in Wired magazine, entitled "The Great Library of Amazonia," from which I quote a portion:
"The more specific the search, the more rewarding the experience. For instance, I've recently become interested in Boss Tweed, New York's most famous pillager of public money. Manber types "Boss Tweed" into his search engine. Out pop a few books with Boss Tweed in the title. But the more intriguing results come from deep within books I never would have thought to check: A Confederacy of Dunces, by John Kennedy Toole; American Psycho, by Bret Easton Ellis; Forever: A Novel, by Pete Hamill. I immediately recognize the power of the archive to make connections hitherto unseen. As the number of searchable books increases, it will become possible to trace the appearance of people and events in published literature and to follow the most digressive pathways of our collective intellectual life.
"From the Hamill reference, I link to a page in the afterward on which he cites books that influenced his portrait of Tweed. There, on the screen, is the cream of the research performed by a great metropolitan writer and editor. Some of the books Hamill recommends are out of print, but all are available either new or used on Amazon.
"With persistence, serendipity and plenty of time in a library, I may have found these titles myself. The Amazon archive is dizzying not because it unearths books that would necessarily have languished in obscurity, but because it renders their contents instantly visible in response to a search. It allows quick query revisions, backtracking, and exploration. It provides a new form of map.
"Getting to this point represents a significant technological feat. Most of the material in the archive comes from scanned pages of actual books. This may be surprising, given that most books today are written on PCs, e-mailed to publishers, typeset on computers, and printed on digital presses. But many publishers still do not have push-button access to the digital files of the books they put out. Insofar as the files exist, they are often scattered around the desktops of editors, designers, and contract printers. For books more than a few years old, complete digital files may be lost. John Wiley & Sons contributed 5,000 titles to the Amazon project -- all of them in physical form.
"Fortunately, mass scanning has grown increasingly feasible, with the cost dropping to as low as $1 each. Amazon sent some of the books to scanning centers in low-wage countries like India and the Philippines; others were run in the United States using specialty machines to ensure accurate color and to handle oversize volumes. Some books can be chopped out of their bindings and fed into scanners, others have to be babied by a human, who turns pages one by one. Remarkably, Amazon was already doing so much data processing in its regular business that the huge task of reading the images of the books and converting them into a plain-text database was handled by idle computers at one of the company's backup centers."