On March 11, 2009 Johan Bollen of Los Alamos National Laboratory and six co-authors published "Clickstream Data Yields High Resolution Maps of Science" in the open access online journal Plos ONE. The map was based on clickstream data collected when online readers switched from one journal to another, allowing the collection of about one billion data points—a far greater number, and presumably more reflective of actual reading patterns, than the prior method of citation analysis developed by the Institute for Scientific Information (now Thomson Scientific's Web of Science). That method traces the relationship of footnotes in scholarly journals.
"Maps of science derived from citation data visualize the relationships among scholarly publications or disciplines. They are valuable instruments for exploring the structure and evolution of scholarly activity. Much like early world charts, these maps of science provide an overall visual perspective of science as well as a reference system that stimulates further exploration. However, these maps are also significantly biased due to the nature of the citation data from which they are derived: existing citation databases overrepresent the natural sciences; substantial delays typical of journal publication yield insights in science past, not present; and connections between scientific disciplines are tracked in a manner that ignores informal cross-fertilization.
"Scientific publications are now predominantly accessed online. Scholarly web portals provide access to publications in the natural sciences, social sciences and humanities. They routinely log the interactions of users with their collections. The resulting log datasets have a set of attractive characteristics when compared to citation datasets. First, the number of logged interactions now greatly surpasses the volume of all existing citations. This is illustrated by Elsevier's announcement, in 2006, of 1 billion (1×109) article downloads since the launch of its Science Direct portal in April 1999. In contrast, around the time of Elsevier's announcement, the total number of citations in Thomson Scientific's Web of Science from the year 1900 to the present does not surpass 600 million (6×108). Second, log datasets reflect the activities of a larger community as they record the interactions of all users of scholarly portals, including scientific authors, practitioners of science, and the informed public. In contrast, citation datasets only reflect the activities of scholarly authors. Third, log datasets reflect scholarly dynamics in real-time because web portals record user interactions as soon as an article becomes available at the time of its online publication. In contrast, a published article faces significant delays before it eventually appears in citation datasets: it first needs to be cited in a new article that itself faces publication delays, and subsequently those citations need to be picked up by citation databases.
"Given the aforementioned characteristics of scholarly log data, we investigated a methodological issue: can valid, high resolution maps of science be derived from clickstream data and can clickstream data be leveraged to yield meaningful insights in the structure and dynamics of scholarly behavior? To do this we first aggregated log datasets from a variety of scholarly web portals, created and analyzed a clickstream model of journal relationships from the aggregate log dataset, and finally visualized these journal relationships in a first-ever map of science derived from scholarly log data" (http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0004803#pone.0004803-Brody1, accessed 03-19-2009).