Detail map of Washington, District of Columbia, United States,San Francisco, California, United States

A: Washington, District of Columbia, United States, B: San Francisco, California, United States

The Library of Congress Has Archived 170 Billion Tweets

1/4/2013

On January 4, 2013 Gayle Osterberg, Director of Communications at the Library of Congress reported in the Library of Congress Blog

"An element of our mission at the Library of Congress is to collect the story of America and to acquire collections that will have research value. So when the Library had the opportunity to acquire an archive from the popular social media service Twitter, we decided this was a collection that should be here.  

"In April 2010, the Library and Twitter [based in San Francisco] signed an agreement providing the Library the public tweets from the company’s inception through the date of the agreement, an archive of tweets from 2006 through April 2010. Additionally, the Library and Twitter agreed that Twitter would provide all public tweets on an ongoing basis under the same terms.

"The Library’s first objectives were to acquire and preserve the 2006-10 archive; to establish a secure, sustainable process for receiving and preserving a daily, ongoing stream of tweets through the present day; and to create a structure for organizing the entire archive by date.

"This month, all those objectives will be completed. We now have an archive of approximately 170 billion tweets and growing. The volume of tweets the Library receives each day has grown from 140 million beginning in February 2011 to nearly half a billion tweets each day as of October 2012.  

"The Library’s focus now is on addressing the significant technology challenges to making the archive accessible to researchers in a comprehensive, useful way. These efforts are ongoing and a priority for the Library.  

"Twitter is a new kind of collection for the Library of Congress but an important one to its mission. As society turns to social media as a primary method of communication and creative expression, social media is supplementing, and in some cases supplanting, letters, journals, serial publications and other sources routinely collected by research libraries.  [Bold face is my addition, JN.]

"Although the Library has been building and stabilizing the archive and has not yet offered researchers access, we have nevertheless received approximately 400 inquiries from researchers all over the world. Some broad topics of interest expressed by researchers run from patterns in the rise of citizen journalism and elected officials’ communications to tracking vaccination rates and predicting stock market activity.

"Attached is a white paper [PDF] that summarizes the Library’s work to date and outlines present-day progress and challenges."

————

♦♦ To which James Gleick, author of The Information, responded in the New York Review of Books on January 16, 2013 in a blog entry titled Librarians of the Twitterverse, from which I quote this selection:

"For a brief time in the 1850s the telegraph companies of England and the United States thought that they could (and should) preserve every message that passed through their wires. Millions of telegrams—in fireproof safes. Imagine the possibilities for history!  

“ 'Fancy some future Macaulay rummaging among such a store, and painting therefrom the salient features of the social and commercial life of England in the nineteenth century,' wrote Andrew Wynter in 1854. (Wynter was what we would now call a popular-science writer; in his day job he practiced medicine, specializing in 'lunatics.') 'What might not be gathered some day in the twenty-first century from a record of the correspondence of an entire people?'

"Remind you of anything?  

"Here in the twenty-first century, the Library of Congress is now stockpiling the entire Twitterverse, or Tweetosphere, or whatever we’ll end up calling it—anyway, the corpus of all public tweets. There are a lot. The library embarked on this project in April 2010, when Jack Dorsey’s microblogging service was four years old, and four years of tweeting had produced 21 billion messages. Since then Twitter has grown, as these things do, and 21 billion tweets represents not much more than a month’s worth. As of December, the library had received 170 billion—each one a 140-character capsule garbed in metadata with the who-when-where. . . . "

Timeline Themes