4406 entries. 94 themes. Last updated December 26, 2016.

3. From Cuneiform Archives to Search Engines: The History of Bibliographical Control, Indexing and Searching

In this chapter I will trace the history of describing, indexing and searching information in libraries leading eventually, through many intermediate steps over the centuries, to the development of web search engines, which search the "virtual library without walls." Because most of us use web search engines in our daily lives, the history of these topics may be of increasing interest. While my first two chapters emphasized book history, the later portions of this chapter will combine book history with more technical detail regarding computing. Like most other topics in the history of information, this is a long story. In its present form the chapter is mainly an outline. As time permits I will fill in more details and more interpretation.

Information is so limited regarding the organization and cataloguing of libraries in the ancient world that it is not difficult to summarize. Early in the history of books and libraries, beginning circa 200 BCE according to the very fragmentary surviving records, the poet, Kallimachos, head of the Royal Library of Alexandria, sought control over the presumed hundreds of thousands of papyrus rolls his library contained by writing tables or lists. Called pinakes in Greek, Kallimachos's pinakes are considered the beginning of bibliography. It is believed that Kallimachos used an alphabetical arrangement by author as part of his organization scheme, and though we cannot credit Kallmachos with the invention of the alphabetic system, it is believed that the system may first have been put to effective use at the Alexandrian Library. 1Daly, Contributions to a History of of Alphabetization in Antiquity and the Middle Ages (1967) 25. See also Blum, Kallimachos: The Alexandrian Library and the Origins of Bibliography. Trans. H. H. Wellisch (1991). Regarding the content, arrangement, and shelving of the Alexandrian Library we know remarkably little, and we know almost nothing about the organization of the library at Pergamum. We know that papyrus rolls were stored in Greek and Roman libraries in book cabinets with or without doors, often called armaria, in a system known as “pigeon holes,” and we could surmise that these armaria might have been arranged in some subject order, but we have little or no documentation concerning this. Available information suggests that after the fall of the Roman empire early medieval European monastic libraries, intended for the use of members of their monastic communities, continued to store papyrus rolls and papyrus or parchment codices in armaria, even as the papyri gradually disintegrated in the damp European climate, and were eventually copied onto more permanent parchment codices. These cabinets and chests, probably little different than what were used in ancient Greece or Rome, were used for book storage well through the thirteenth century and later.

One of the earliest and most widely used cross-indexing systems was the Eusebian canons or Eusebian sections, also known as Ammonian Sections, implemented circa 280-340 CE during the transition from the roll to the codex. The Eusebian canons were a system of dividing the Four Gospels used from late Antiquity through the Middle Ages. The sections are indicated in the margin of nearly all Greek and Latin manuscripts of the Bible from the mid to late fourth century onward, and they are usually summarized in Canon Tables at the start of the Gospels. There are about 1165 sections: 355 for Matthew, 235 for Mark, 343 for Luke, and 232 for John; the numbers, however, vary slightly in different manuscripts. These tables represent a way for the reader to move back and forth between related sections in the texts, and are an early organizational structure and cross-indexing system.

Once we reach the Middle Ages, a fundamental point to make in any attempt at summarizing such as this, is the enormous diversity of medieval libraries of different cities, towns and countries over the roughly one thousand years through which the Middle Ages extended. Because of this diversity, generalizations must be taken with caution. There is also an immense amount of scholarship regarding medieval library history, including editions, better or worse, of the catalogues of medieval libraries, which tended to be simple listings. This scholarship on the history of medieval libraries has been built up since at least the mid-nineteenth century. Because it was necessary to determine authenticity of medieval legal documents, scholarship on paleography is considerably older, dating back at least to the seventeeth century writings of Mabillon on Latin charters and the writings of Montfaucon on Greek paleography. Useful guides online are C. D. Wright, Medieval & Modern Manuscript Catalogues. Much more extensive is Paul O. Kristeller, Latin Manuscript Books before 1600: A List of the Printed Catalogues and Unpublished Inventories of Extant Collections, 4th ed. by Sigrid Krämer, MGH Hilsmittle 13 (Munich, 1993). The on-line version is searchable, and divided into sections: A: Bibliography and Statistics of Libraries and Their Collections of Manuscripts; B: Works Describing Manuscripts of More than One City or Groups of Libraries; C: Printed Catalogues and Handwritten Inventories of Individual Libraries, by City; D: Directories and Guides to Libraries and Archives. Much of this material may be available in microform or online. A work which I have found useful is Christ, The Handbook of Medieval Library History. Translated and edited by T. M. Otto from the Handbuch der Bibliothekswissenschaft. . . (1984).

To make books more accessible to increasing numbers of students, toward the end of the thirteenth century librarians at the University of Paris and other universities took some books out of the armaria, in which, following Roman models, they had previously been arranged in monastic libraries, and arranged them in desks by subject matter, chaining many of the books to the desks where students could read them. They also arranged the books remaining in armaria by subject matter, and provided a subject catalogue of their holdings.1 “The arrangement and cataloguing of books within the individual colleges and other university institutions were also influenced by the changes in book usage reflected in the union catalogs and location lists. In monastic institutions, book collections had traditionally been kept in book chests or armaria — though the individual volumes themselves doubtless were, for much of the time, parceled out among the members of the house. We find, however, in the writings of the Dominican Humbert of Romans, about 1270, instructions that books in the armaria should be physically arranged by subject matter, and that certain ones of them should be chained at lecterns for the common use of all, rather than being either locked away in a chest or loaned for the use of only one person. Before the end of the thirteenth century, both the Collège de Sorbonne in Paris [University of Paris] and University College in Oxford had such a collection of chained books attached to reading benches. Early in the next century, about 1320, a member of the Sorbonne compiled a subject catalog of the hundreds of individual texts bound together in some three hundred chained codexes of his college. This development — arrangement of manuscripts by subject matter, affixing chains to selected books, an index of the content of a whole collection — corresponds in its way, in both purpose and ingenuity, to the making of concordances, distinction collections, subject indexes, and union catalogs; and it is in such a context that it should be considered. The common goal of all these devices was to facilitate access to desired information” (Rouse & Rouse 1991, op.cit., 238-39). In 1289, the library of the University of Paris, which was then probably the largest and best library in Europe, was organized into two collections: the magna libraria in which the most frequently used books were chained and made available for general use for teaching and course work, and the parva libraria which contained duplicates, and more specialized works needed for research, which could be loaned out to qualified users. The library included 1017 books at this time. This information comes from a catalogue of the library written in 1338 which incorporated a catalogue of the library written in 1290, of which only two leaves partially survived as pastedowns. 1 “The importance of the establishment of a chained library, in the broader picture, is that it established a place where books were not merely kept but where they were used, and used in common. This change at the Sorbonne in 1289-92 is part of a general trend to divide collections, which appears in Europe at the end of the thirteenth and continues through the fourteenth century. Institutions began to divide their collections by causing certain commonly used works to be chained so that these would always be available to their members, while at the same time continuing to provide for the individual needs of their members and outsiders through a circulating collection. The Sorbonne probably provides the earliest clear example of this change taking place” (Rouse & Rouse 1991, op cit., 364, and 343, 352, reproducing a leaf of the 1290 catalogue as plate 8). As primitive as the system may seem to us, chaining codices to desks with iron chains was a new development in library management, made feasible by the heavy bindings, usually of leather over wooden boards, which made it practical to attach a chain to a binding. Such an aggressive security system was a reflection both of the high cost of the codices and the great difficulty of replacing a manuscript that might be missing or stolen. The system remained in use in certain libraries well through the fifteenth century, and was preserved in a few libraries, such as that at Hereford Cathedral, even later, but seems to have been abandoned with the expansion of libraries after printing was well established. One of the only monastic humanistic chained libraries that remains intact, with its original manuscript codices chained to its original desks, is the Biblioteca Malatestiana founded in the mid-fifteenth century just before the introduction of printing, in Cesena, Emilia-Romagna, Italy. Prior to the development of the codex, it may have been equally expensive and difficult to replace lost papyrus rolls in Greek and Roman libraries but there was no practical secure way to attach a papyrus roll to a desk.

By the end of the thirteenth century there were as many as twenty thousand foreign students resident in Paris. Such a large community of students and their teachers contributed to significant developments in the history of the book. The first alphabetical indexing tools for books were developed in Paris by university teachers and members of religious orders as reference tools for preachers. Both groups shared responsibility for preaching to the laity. Besides these indices prepared for preachers, Paris schools developed the first reference works designed to facilitate access to texts for strictly scholarly purposes, without any application in the preparation of sermons. "By mid-century, there were alphabetical indexes to the majority of works in the Latin Aristotelian corpus, Old Logic, New Logic, the Ethica, the Libri naturales. Since these reference tools are anonymous, it is obviously impossible to prove that they originate at Paris; but the combination of the two activities, Aristotelian studies and creation of indexes, can point nowhere else at this period." 1“Rouse & Rouse, Authentic Witnesses: Approaches to Medieval Texts and Manuscripts (1991) 229.

As we have seen, the production techniques and features of manuscript books evolved slowly. Advances originating in the thirteenth century, including the pecia system, indexing, and other accessibility tools such as library arrangement, remained operational, and represented essentially the state of the art when the mass production method of printing was introduced to European book production in 1455. The rate of change from the very slow transition from the roll to the codex, from majuscule to minuscule, and from the introduction of indexing and other organizational features of manuscripts in the thirteen century and prior, hardly increased in velocity before the history of printing. Centuries were usually involved in each transition. It was only after the invention of printing that the rate of change and social disruption accelerated with the growing availability of books. And yet even though artisans involved in manuscript production were sometimes forced to adopt the new technology, the disruption from printing was not so much an internal one within book production trades. The impact of the new technology was far greater on society through the increasing availability, lower cost, and more rapid distribution of information.

After the introduction of printing, for librarians and bibliographers the challenge became one not only of protecting and organizing their valuable collections of manuscripts, which because of scarcity and high cost had grown relatively slowly, if at all, before the introduction of printing, but for the first time of managing the rapid growth of information which became much more readily available, and at comparatively lower cost. At least partly because of the increased availability of information after the development of printing, by 1505, when he left the Abbey at Sponheim, polymath Johannes Trithemius had expanded its library to 2000 volumes of printed books and manuscripts from the 40 works present in the library when he became Abbot in 1482. Two thousand volumes represented an exceptionally large library for the time, so its accomplishment was also due to Tritheim's skill and tenacity as a book collector. For librarians, managing the physical space required for storage, and organizing books and manuscripts, both in their physical location and subject/author classification and cataloguing, remained relatively constant challenges ever since. Efforts to solve these organizing, indexing, and accessibility problems led eventually, through centuries of effort and many incremental steps, to today's virtually instantaneous searchability of digital information.

Earlier in this essay I discussed the development of title pages in the latter part of the fifteenth century as an innovation resulting from the desire of printers to identify and sell their editions. By the early sixteenth century standardization, for the most part, of title page information in printed books to include the author’s name, title, place, publisher and date, made it easier to identify books. Of course, for every book that may have supplied this basic bibliographical information there were others published under pseudonyms, or anonymously, or with bogus imprints or inaccurate dates, or with elements of the imprint information printed on the colophon leaf rather than on title page. More significantly, perhaps, the rising tide of printed information made selection of appropriate books on the expanding range of subjects increasingly complicated. Moreover, it was becoming difficult to understand and classify books by subject. Prior to the invention of printing, and probably prior to the mid-16th century, scarcity of information, if it was noticed, or the high cost of books, might have been perceived as greater problems than overload. We might tentatively observe that in Gessner's time information overload was just beginning to be experienced by those who could afford books or who had access to good libraries. To list the most useful and authoritative books by subject, physician and bibliographer Conrad Gessner published a “universal” classified bibliography (1545-55), and index of knowledge in “all” printed books (1548-49). Compiling the Bibliotheca Universalis was such a challenge that Gessner confessed the profound sense of freedom he experienced when he finished the massive work in 1545: "In truth I rejoice and thank God because I have finally gotten out of the labyrinth in which I was trapped for almost three years."1 Balsamo, Bibliography: History of a Tradition (1990) 32. Still, roughly ninety years after the introduction of printing, completion of a relatively complete universal bibliography and knowledge index remained within the grasp of one very talented and driven man.

As helpful as some consistency in bibliographical information was, and as much improved as indices to printed books or sets of books could be, speed of access to data in analog indices was always limited. Accessing data in library catalogues presented greater problems, especially as the amount of information indexed in many library catalogues continually increased,eventually resulting in the perception of information overload from excessive numbers of printed books to which Denis Diderot referred in 1755:

"As long as the centuries continue to unfold, the number of books will grow continually, and one can predict that a time will come when it will be almost as difficult to learn anything from books as from the direct study of the whole universe. It will be almost as convenient to search for some bit of truth concealed in nature as it will be to find it hidden away in an immense multitude of bound volumes. . . ."

Organizing some of that knowledge, and making it readily available, were challenges and purposes of Diderot and d’Alembert’s Encyclopédie ou dictionnaire des sciences, des arts et des métiers, par une société‚ de gens de lettres. Along with writers such as Diderot and d'Alembert, librarians, library cataloguers, bibliographers and publishers of reference works traditionally shared the goals of organizing information, and making it accessible. The business of indexing and searching information, now conducted on previously unimaginable scale at electron speed in the “universal library without walls” by web search engines, evolved through the history of books and libraries.

The first national code for descriptive cataloguing was implemented circa 1789-1791 after the new French Republic nationalized numerous libraries and archival repositories. Seized books were brought to literary depots at several locations in Paris. The staff at each depot was ordered to record the basic details about each item on cards. These cards were then bound up in bundles and sent to the Paris Bureau de Bibliographie. Because of wartime shortages, the blank backs of confiscated playing cards were used to record the information. The title page was transcribed on the card and the author’s surname underlined for the filing word. If there was no author, a keyword in the title was underlined. A collation was added that was supposed to include the number of volumes, size, a statement of illustration, the paper or parchment on which the book was written or printed, the kind of type, any missing pages, and a description of the binding if it was outstanding in any way. The collation was partly for the purpose of identifying valuable books that the government might offer for sale in order to increase government revenue. After the cards were filled in and put in order by the underlined filing word, they were strung together by running a needle and thread through the lower left hand corners to keep them in order. This was one of the first recorded uses of cards for library cataloguing.

Although the French revolutionary government attempted to impose some order in the chaotic redistribution of library holdings, if only for monetary gain, most French libraries which were not confiscated after the revolution, including some which were founded during the Middle Ages, continued to maintain their traditional hand-written catalogues in book form, some of which were imprecise or had accumulated errors, or listed books which could no longer be found. Following medieval subject arrangements, many early library catalogues in various countries simply listed books and their physical locations within libraries. Partly because of vagueness in cataloguing, library security, if any, was often extremely lax, sometimes resulting in additional clandestine redistribution of books by thieves, of whom the most notable in the nineteenth century, if not for all time, was the mathematician, paleographer, and pioneer historian of science, Guglielmo Libri. Reorganization of the French library and archive system required the better part of the nineteenth century.

If the early hand-written library catalogues were not very helpful, researchers could, of course, consult numerous specialized printed subject bibliographies, or the limited printed catalogues of certain libraries, to determine what they might be looking for. The first widely appreciated rules for standardizing library cataloguing were promulgated by Antonio Panizzi in 1841. As these, and other cataloguing and classifying rules, such as the Dewey Decimal Classification (1876), were gradually implemented, users of libraries could expect more reliable research help from author and subject catalogues. Eventually, massive library card catalogues were created, and sometimes published in print in monstrous sets of folio volumes. Libraries willingly allocated space to these enormous sets when they were the only way to access the information; most of these monsters have since found their ways to landfills. 1 In January 2011 a university in Virginia offered to pack up and send a set of the 756 folio volumes of Mansell's National Union Catalogue of Pre-1956 Imprints to any institution that would be willing to pay the freight. I doubt if there were any takers.A few systems of index cards, such as the Institut International de Bibliographie, founded by Paul Otlet and Henri la Fontaine in 1895, extended their cataloguing goals far beyond the range of library science, becoming almost unbelievably ambitious in their efforts to index knowledge, and essentially represented analog search engines.

With the expanding volume of physical information librarians were concerned about their ability to find space to house their growing collections of monographs and periodicals. Before digitization, microfilming bulky collections seemed like the most efficient way to conserve space and to share copies of such information with other scholars and institutions. Searching through reels of microfilm was tedious, but compression of the information in this form opened the possibility of searchability, and inventors sought ways to both index and speed up access to information on rolls of microfilm. In 1931 Emanuel Goldberg of Zeiss Ikon receivesd  U.S. Patent No. 1,838,389 for a "Statistical Machine." This patent described an electronic machine for searching through data encoded on reels of film, using "radiating energy to actuate a recorder when the explored indications upon the search plate and record element are identical, the indications on one of said elements being penetrable by the rays and the indication on the other element being impenetrable by the rays." Later, Vannevar Bush incorporated technology similar to this in the Rapid Selector machine on which he began development in 1938. The existence of Goldberg's patent prevented Bush from patenting his Rapid Selector. Bush's machine became famous after publication in July, 1945 of his Atlantic Monthly article, "As We May Think," describing the Memex. In September of the same year Bush published a condensed, illustrated version of "As We May Think" in Life magazine. Life's editors added the following subtitle: "A Top U.S. Scientist Foresees a Possible Future World in Which Man-Made Machines Will Start to Think." They also replaced the Atlantic Monthly's numbered sections with headings, and added illustrations of the "cyclops camera,' the "supersecretary" and the "Memex" in the form of a desk. This was the first published illustration of what the Memex might have looked like. Because the hypothetical Memex was capable of making permanent associative links in information it foreshadowed aspects of the personal computer and hyperlinks on the Internet.

In 1951 Louis N. Ridenour, Ralph R. Shaw, and Albert G. Hill published a thin book entitled Bibliography in an Age of Science. This book published three lectures delivered at the University of Illinois the previous year. Though it was preceded by journal articles and technical reports, this may be the first separately published book to address the problems of applying new technologies to the searching and storage of printed information in libraries. Shaw's article includes illustrations on pp. 60-61 of the Rapid Selector prototype which was in operation at this time. This machine, which applied the ideas of Emanuel Goldberg and the Memex idea of Vannevar Bush, stored 72,000 frames of information on a 2,000 foot reel of film. The prototype could search through the data at the rate of 78,000 "codes per minute." "Improvement of this searching speed to 120,000 codes per minute is now in sight." As far as I have been able to determine, no further work was done on this project beyond the prototype. Instead, research was directed toward using electric punched card tabulators for information retrieval, or anticipating the use of the new high-speed digital computers, of which the Univac 1 was the first actually delivered to a customer, the U.S. Census Bureau, in 1953.

As the amount of printed information expanded exponentially, the labor involved in creating library card catalogues inexorably increased along with the cumbersome aspects of using them. With the rapid growth of medical research after World War II, the Army Medical Library (now the National Library of Medicine), which through Index Medicus was required to index all journal articles in all languages concerning medicine, was challenged to control an exploding number of publications. Automating the creation, indexing and searching of Index Medicus became one of the first topics of research on information retrieval, or online database development, in The Army Medical Library Research Project at the Welch Medical Library at Johns Hopkins University as early as 1949, even before electronic computers were available for sale. Another very early bibliographically-related information retrieval project, considered the foundation of humanities computing, was Roberto Busa’s Index Thomisticus, started by Father Busa with the support of IBM as early as 1949. During the 1950s, with the rapid development of the mainframe industry, Hans Peter Luhn of IBM developed automated systems for encoding library information in 1957 and for the production of literature abstracts in 1958. Auto-indexing and auto-abstracting became news stories. Eugene Garfield’s citation analysis, which was first published in 1964 in five printed volumes, indexing 613 journals and 1.4 million citations. NLM’s Medical Literature Analysis and Retrieval System (Medlars) eventually became operational in January 1964. This was the first large scale computer-based retrospective library search service available to the general public. However, Index Medicus continued to be published in book form, as it had been since 1879, and in October 1971 the National Library of Medicine first brought Index Medicus online through Medline (Medical Literature Analysis and Retrieval System Online).

Shortly before NLM brought Index Medicus online Edgar F. Codd of IBM published "A Relational Model of Data for Large Shared Data Banks" in Communications of the ACM 13 (1970): 377–387. Codd’s model became widely accepted as the definitive model for relational database management systems. Codd postulated that data should be stored independently from hardware and that a programmer should use a nonprocedural language for accessing data. The crux of Codd’s solution was that data, rather than being stored in a hierarchical structure, would be stored in simple tables composed of rows and columns in which columns of like data would relate tables to one another. A database user or application, in Codd’s way of thinking, would not need to know the structure of the data in order to query that data. Codd's work led to rapid growth in the development of databases of all kinds yet traditional, non-automated means of accessing large amounts of information remained. Even with increasing adoption of Medline, Index Medicus continued to be published in print until December 2004--an excellent example of the persistence of print, perhaps as a result of bureaucratic indecision, resulting in continuing accumulation of dense and difficult to access printed volumes long after the printed data had been supplanted in accessibility and ease of use by an online database.

Through the 1960s and early 1970s, for most part, library card files, or catalogues printed from the cards, as cumbersome as they were, remained state of the art for indexing and searching library holdings. In 1967 the colleges and universities in the state of Ohio founded the Ohio College Library Center (OCLC) to develop a computerized system in which the libraries of Ohio academic institutions could share resources and reduce costs. After the bibliographical database expanded far beyond the state of Ohio it was renamed Online Computer Library Center, retaining the same initials. By 2007 OCLC incorporated the online catalogues of more than 27,000 libraries and contained “1.1 billion catalogued items.”

Some early library-related information retrieval projects were influential upon the thinking of pioneers in other aspects of computing, such as J.C.R. Licklider. A psychologist, Licklider was especially interested in the relationship of people to computers. He was also interested in the relationship of the physical information in libraries to the digital information stored in the mainframes of the time, and in making the growing body of information stored in libraries more accessible. In 1960 Licklider published Man-Computer Symbiosis, postulating that the computer should become an intimate symbiotic partner in human activity, including communication. In 1962 Licklider and and Welden E. Clark published “Online Man-Computer Communication,” calling for time-sharing of computers, for graphic displays of information, and the need for an improved graphical interface. After he was appointed Director of the Pentagon's Information Processing Techniques Office (IPTO) in 1962 Licklider sent a memo to members and affiliates of what he jokingly called the "Intergalactic Computer Network,"outlining a key part of his strategy to connect all their individual computers and time-sharing systems into a single computer network spanning the continent.”  In November 1964 a meeting between Licklider and Lawrence G. Roberts motivated Roberts to undertake the creation of the ARPANET. Less than a year later, in October 1965 Roberts conducted the first "actual network experiment", tying MIT Linconln Labs' TX-2 computer to System Development Corporation's Q32. This was the first time that two computers talked to each other, and the first time that packets were used to communicate between computers. Also in 1965 Licklider published a book, now for the most part forgotten, entitled Libraries of the Future.

Two years later Lawrence Roberts published the first paper on the design of the Arpanet: “Multiple computer networks and intercomputer communication.” The following year Licklider and Robert W. Taylor published  The Computer as a Communication Device in which they described features of the future Arpanet. On October 29, 1969 the first message was sent over the Arpanet from Leonard Kleinrock's UCLA computer to the second node at Stanford Resarch Institute's computer. The message was simply "Lo." In March 1970 Arpanet established a node at  Bolt Beranek and Newman in Cambridge, thereby spanning the U.S. By 1971 the Arpanet had 15 nodes. In 1973 the first international connections were made to the Arpanet. After continued expansion In 1990 Arpanet discontinued operations and folded into the Internet.

Related to education and culture, in 1987 in order to photograph, store, and organize the art work of the painter, Andrew Wyeth, Fred Mintzer, Henry Gladney and colleagues at IBM developed a high resolution digital camera for photographing art works and a PC-based database system to store and index the images. The system was used by Wyeth's staff to photograph, store, and organize about 10,000 images. "Pictures were scanned at a spatial resolution of 2500 by 3000 pixels and a color depth of 24 bits-per-pixel, and were color calibrated." This was the first digital image database of cultural materials.

In March 1989 Tim Berners-Lee at CERN wrote Information Management: A Proposal, proposing an Internet-based hypertext system. ARCHIE, a program designed to index FTP archives, was developed by three students at McGill University, Alan Emtage, Bill Heelan, and Peter J. Deutsch, in 1990. ARCHIE was the first “search engine,” as distinct from a “web search engine.” By November 12, 1990 Berners-Lee was planning the World Wide Web, issuing from CERN  World Wide Web: Proposal for a Hypertext Project. The following day Berners-Lee wrote the first web page. Over the Christmas holiday Berners-Lee wrote the software tools necessary for a working World Wide Web:

A. The first web browser called WorldWideWeb.


C. The first Web serverCERN httpd. It was operational on Christmas Day 1990.

In March 1991 Berners-Lee releated the first web browser, WorldWideWeb, to a number of people at CERN. On August 6, 1991 Berners-Lee made web server and web browser software available at no cost. On April 30, 1993 CERN released World Wide Web software into the public domain. This was a critical step in the world wide adoption of the web.

On March 4, 1993 progammer Marc Andreesen announced on Usenet the creation of the Mosaic browser and the introduction of the image tag. This was the first graphics-based web browser. The National Center for Supercomputing Applications (NCSA) released the brower on April 22, 1993. On April 4, 1994 Andreesen and James H. Clark of Silicon Graphics founded Mosaic Communications Corporation, the first company to exploit the potential of the Mosaic web browser. This was the first company to exploit the economic potential of the World Wide Web. Their first product, Mosaic Netscape 0.9 beta, was released on October 13, 1994.

The first "full text" crawler-based web search engineWeb Crawler, created by Brian Pinkerton at the University of Washington, became operational on April 20, 1994. "Unlike its predecessors, it let users search for any word in any web page, which became the standard for all major search engines since. It was also the first one to be widely known by the public." In January 1996, Larry Page and Sergey Brin, students of computer science at Stanford, began collaboration at on a search engine called BackRub, named for its unique ability to analyze the "back links" pointing to a given website. This became the Google web search engine, and Google’s PageRank algorithm was adapted and expanded for the Internet conceptually from the ranking of printed scientific papers through citation analysis. In January 1998 Page, Brin, Rajeev Motwani, and Terry Winograd of the Stanford Database Group published on paper The PageRank Citation Ranking: Bringing Order to the Web. "The worldwide web creates many new challenges for information retrieval. It is very large and heterogeneous. Current estimates are that there are over 150 million web pages with a doubling life of less than one year." These and other developments resulting from information retrieval research originating at brick and mortar libraries, together with advances in computing and networking, eventually led to the present virtually instantaneous speed of searching digital books, databases, and web pages.

Electronic digital computing evolved out of research done in World War II. The world's first general purpose digital computer, the ENIAC, became operational in 1945. For the first ten years or more after the invention of the ENIAC there were less than twenty electronic computers in the world, and some visionaries thought we did not need that many. The first mainframes were extremely expensive and only available to governments, large corporations and research centers. Very gradually the cost of computing declined, but it was not until the development of personal computing in the 1980s that computing became affordable to most people, and it was not until the connection of hundreds of millions of personal computers to the Internet in the 1990s that the impact of computing was fully felt upon the widest reaches of society. By comparison, it took roughly the same amount of time--fifty years--for printing to spread throughout Europe. For the Middle Ages, in which all change occurred much more slowly than today, such a technological shift within 50 years was probably perceived by interested observers as every bit as fast and disruptive as whatever shift we are currently experiencing. One of the differences, of course, is that there are far more literate people living today, and computing has made the production of information far easier than printing by movable type made book production. For the first several hundred years of printing it was a difficult trade to learn with very high capital and experiential costs of entry. Today virtually anyone with a personal computer and an Internet connection can become a self-publisher or blogger. More than a billion people use computers in the form of desktops, laptops, cell phones, tablets or what have you, and the production of information is staggering. In July 2008, only eighteen years after Berners-Lee's invention of the World Wide Web, Google announced that it was indexing over one trillion (1,000,000,000,000) unique URLs. We may presume that the second trillion will be reached exponentially faster than the first. Most remarkable may be that it appears to take the search engines about as long to search through a trillion URLs, once they have been indexed, as it did for the same search engines to search a mere hundred million URLs a few years back: hardly any time at all!

back to top