cover of Science magazine with book sculpture of Matej Krén

"Tens of thousands of books appear in this photograph of the interior of the sculpture Idiom, by Matej Krén. On page 176, Jean Baptiste Michel et al. describe an even larger collection: a 5.2-million-book corpus containing 4% of all books ever published. Statistical analysis of this corpus makes it possible to study cultural trends quantitatively. Original sculpture (Municipal Library of Prague): Matej Krén/Photograph: Zdeněk Urbánek"

Detail map of Mountain View, California, United States,Cambridge, Massachusetts, United States

A: Mountain View, California, United States, B: Cambridge, Massachusetts, United States

The Cultural Observatory at Harvard Introduces Culturomics

12/16/2010

On December 16, 2010 a highly interdisciplinary group of scientists, primarily from Harvard University: Jean-Baptiste Michel,Yuan Kui Shen, Aviva P. Aiden, Adrian Veres, Matthew K. Gray, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak and Erez Lieberman Aiden published "Quantitative Analysis of Culture Using Millions of Digitized Books," Science, Published Online December 16 2010 Science 14 January 2011: Vol. 331 no. 6014 pp. 176-182 DOI: 10.1126/science.1199644

The authors were associated with the following organizations: Program for Evolutionary Dynamics, Institute for Quantitative Social Sciences Department of Psychology, Department of Systems Biology Computer Science and Artificial Intelligence Laboratory, Harvard Medical School, Harvard College Google, Inc. Houghton Mifflin Harcourt Encyclopaedia Britannica, Inc. Department of Organismic and Evolutionary Biology Department of Mathematics, Broad Institute of Harvard and MITCambridge School of Engineering and Applied Sciences Harvard Society of Fellows, Laboratory-at-Large.

This paper from the Cultural Observatory at Harvard and collaborators represented the first major publication resulting from The Google Labs N-gram (Ngram) Viewer,

"the first tool of its kind, capable of precisely and rapidly quantifying cultural trends based on massive quantities of data. It is a gateway to culturomics! The browser is designed to enable you to examine the frequency of words (banana) or phrases ('United States of America') in books over time. You'll be searching through over 5.2 million books: ~4% of all books ever published" (http://www.culturomics.org/Resources/A-users-guide-to-culturomics, accessed 12-19-2010).

"We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of "culturomics", focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. "Culturomics" extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities" (http://www.sciencemag.org/content/early/2010/12/15/science.1199644, accessed 12-19-2010).  

"The Cultural Observatory at Harvard is working to enable the quantitative study of human culture across societies and across centuries. We do this in three ways: Creating massive datasets relevant to human culture Using these datasets to power wholly new types of analysis Developing tools that enable researchers and the general public to query the data" (http://www.culturomics.org/cultural-observatory-at-harvard, accessed 12-19-2010). 

NOTE: When I returned to this site in September 2020 it appeared unchanged from its content ten years earlier.

Timeline Themes