History of Information

Detail map of Yorktown Heights, New York, United States

Overview map of Yorktown Heights, New York, United States

A: Yorktown Heights, New York, United States

Joseph Raben

From September 9-11 the first Literary Data Processing Conference occurred. It was organized by Harry F. Arader of IBM and chaired by Stephen M. Parrish of Cornell and Jess B. Bessinger of NYU. This was the first conference on what came to be called humanities computing or digital humanities. It took place at IBM's research facility at Yorktown Heights, NY.

"Among the other speakers, Roberto Busa expatiated on the problems of managing 15 million words for his magnum opus on Thomas Aquinas. Parrish and Bessinger, along with the majority of other speakers, reported on their efforts to generate concordances with the primitive data processing machines available at that time. In light of the current number of projects to digitize literary works it is ironic to recall Martin Kay’s plea to the audience not to abandon their punch cards and magnetic tapes after their concordances were printed and (hopefully) published" (Joseph Raben, "Introducing Issues in Humanities Computing", Digital Humanities Quarterly, Vol. 1, No. 1 [2007])

On March 20, 2014 Joseph Raben posted information relevant to the conference on the Humanist Discussion Group, Vol. 27, No. 908, from which I quote:

In September 1964 IBM organized at the same laboratory what it called a Literary Data Processing conference, primarily, I believe now, to publicize the project of Fr. Roberto Busa to generate a huge verbal index to the writings of Saint Thomas Aquinas and writers associated with him. IBM had underwritten this project and Fr. Busa, an Italian Jesuit professor of linguistics, had been able to recruit a staff of junior clergy to operate his key punches. The paper he read at this conference was devoted to the problems of managing the huge database he had created. IBM had persuaded The New York Times to send a reporter to the conference, and in the story he filed he chose to describe in some detail my paper on the Milton-Shelley project. The report of the eccentric professor who was trying to use a computer to analyze poetry caught the fancy of the news services, and the story popped up in The [London] Times and a few other major newspapers around the world.

What impressed me most at that conference, however, was the number of American academics who had been invited to speak about their use of the computer, often to generate concordances. Such reference works had, of course, long antedated the computer, having originated in the Renaissance, when the first efforts to reconcile the disparities among the four Gospels produced these alphabetized lists of keywords and their immediate contexts, from which scholars hoped to extract the "core" of biblical truth. The utility of such reference works for non-biblical literature soon became obvious, and for centuries, dedicated students of literature, often isolated in outposts of Empire, whiled away their hours of enforced leisure by copying headwords, lines and citations onto slips which then had to be manually alphabetized for the printer. Such concordances already existed for a small number of major poets, like Milton, Shelley and Shakespeare.

Apparently unrecognized by the earlier compilers of concordances was the concept that by restructuring the texts they were concording into a new order – here, alphabetical, but potentially into many others – they were creating a perspective radically different from the linear organization into which the texts had originally been organized. A major benefit to the scholar of this new structure is the ability to examine all the occurrences of individual words out of their larger contexts but in association with other words almost immediately adjacent. Nascent in this effort was the root of what we now conceive as a text database.

Some of this vision was becoming visible to the members of the avant garde represented at the Literary Data Processing conference, who had generally taken up a program called KWIC (keyword in context) that IBM had "bundled" with its early computers, a program designed to facilitate control over scientific information. Because it selectedkeywords from rticle titles, it was recognized as a crude but acceptable mechanism for literary concordances, to the extent that Stephen M. Parrish had begun publishing a series for Victorian poets, and others at the conference reported on their work on Chaucer, Old English and other areas of literary interest. In hindsight it is evident that the greater significance of these initiatives was twofold: first, they made clear that even in their primitive state in the 1960s, computers could perform functions beyond arithmetic and second,
that another dimension f language study was available. From the beginning signaled by this small event would come a growing academic discipline covering such topics as corpus linguistics, machine translation, text analysis and literary databases.

Beyond the activity reported at that early conference, it became
increasingly evident that computer-generated concordances could not only serve immediate scholarly needs but could also imply future applications of expanding value. Texts could be read non-linearly, in a variety of dimensions, with the entire vocabulary alphabetized, with the most common words listed first, with the least common words listed first, or with all the words spelled backwards (so their endings could be associated), and in almost any other manner that a scholar's imagination could conjure.Concordances could be constructed for non-poetic works, such as Melville's Moby-Dick or Freud's translated writings. Many poets of lesser rank than Shakespeare, Milton, and Chaucer could now be accorded the stature of being concorded, and even political statements could be made, as when the anti-Stalinist Russian Josip Mandelstam was exalted by having his poetry concorded. David W. Packard even constructed a concordance to Minoan Linear A, the undeciphered writing system of prehistoric Crete.

Looking beyond that group's accomplishment in creating the concordances and other tools they were reporting on, I had a vision of a newer scholarship, based on a melding of the approaches that had served humanities scholars for generations with the newer ones generated by the computer scientists who were struggling at that time to understand their new tool, to enlarge its capacities. Sensing that the group of humanists gathering for this pioneering conference could benefit from maintaining communication with each other beyond this meeting, I devoted some energy and persistence to persuading IBM to finance what I conceived first as a newsletter. Through the agency of Edmond A. Bowles, a musicologist who had decided he could support his family more successfully as an IBM executive than as a college instructor, I received a grant of $5000 (as well as a renewal in the same amount), a huge award at that time for an assistant professor of English and enough to impress my dean, who allowed me a course reduction so I could teach myself to be an editor. . . ."

Timeline Themes

Indexing & Searching Information

Fiction, Poetry, Theater; Literature

Digital Humanities

Related Entries

Roberto Busa:

In "As We May Think" Vannevar Bush Envisions Mechanized Information Retrieval and the Concept of Hypertext

Roberto Busa & IBM Adapt Punched Card Tabulating to Sort Words in a Literary Text: The Origins of Humanities Computing

J. W. Ellison Uses a UNIVAC 1 to Compile the First Computerized Concordance of the Bible

Publication of Roberto Busa's Index Thomisticus: Forty Years of Data Processing in the Humanities

Roberto Busa & Paul Tasman Produce a Computerized Concordance of the Dead Sea Scrolls

Ze-ev Ben Hayyim Founds the "Historical Dictionary of the Hebrew Language" as a Digital Humanities Project

Arader, Parrish & Bessinger Organize the First Humanities Computing or Digital Humanities Conference

Timeline Themes

Related Entries

Roberto Busa: