4406 entries. 94 themes. Last updated December 26, 2016.

Auto-Encoding of Documents for Information Retrieval (1959)

In 1959 computer scientist Hans Peter Luhn published "Auto-Encoding of Documents for Information Retrieval Systems,  M. Boaz (ed) Modern Trends in Documentation (1959) 45-58.

"Luhn believed that the growing rate of information and document production necessitated the invention of methods allowing data to be retrieved from stores of documents without expensive human intervention. This paper discusses auto-encoding based on statistical procedures performed by a machine on the original text of a document already in machine-readable form. The prevalent machine-readable form of that time was primarily punched cards or paper tape and less frequently magnetic tape. The auto-encoding method used word frequency rates, a special thesaurus, and the development of multi-dimensional patterns based on word proximity. At the time, application of the method was limited to articles of 500 to 5000 words, but Luhn was confident that the logical capabilities of electronic machines, statistical methods, and "further research into the characteristics of human behavior as manifested in writing" would lead to better information dissemination and retrieval. Earlier articles by this author discuss the automatic creation of abstracts and the development of thesauri" (http://www.ischool.utexas.edu/~ssoy/organizing/l391d2b.htm, accessed 04-26-2009).