Archive for Information Overload

Stop publishing so much already!

KLerman

Got your attention? Now that I have it, I would like to take a few minutes to discuss the role of limited attention and information overload in science. Attentive acts such as reading a scientific paper (or a tweet), answering an email, or watching a video require mental effort, and since the human brain’s capacity for effort is limited (by its oxygen and glucose consumption requirements), so is attention. Even if we could spend all of our time reading papers or answering emails, there is only so much we could read in 24 hours. In reality, we spend far less time attending to things, like science papers, before we get tired, bored, or distracted by other demands of our busy lives.

The situation is actually worse. Not only is attention limited, but we must also divide it among a rapidly proliferating number of information sources. Every minute on the Web thousands of new blog posts are written, hours of video are uploaded to YouTube, and hundreds of thousands of status updates are posted on Facebook and Twitter. The number of scientific papers posted to Arxiv.org (see figure) has grown steadily since its inception to more than 7000 a month! Even if you console yourself thinking that only a small fraction of papers is relevant to you, I am willing to bet that the number of papers submitted to the conferences you care about, not to mention the number of conferences and journals themselves, has also grown over the years. What this adds up to is rather nasty case of information overload.

Figure: Monthly submission rate to arxiv.org over more than 21 years. (Source: arxiv.org)
image1

The information overload, coupled with limited attention, reduces the likelihood anyone will notice a specific paper (or another item of information). As a consequence, even real gems will often fail to attract attention, and fade from collective awareness as new items appear on the scene.

Figure: Distribution of time to first citation of papers published by the American Physical Society.
image2

The collective neglect is apparent in the figure above, which reports the time to first citation versus the age of a paper published in the journals of the American Physical Society, a leading venue for publishing physics research. A newly published paper is very quickly forgotten. After a paper is a year old, its chances of getting discovered drop like a rock!

Rising Inequality

One of the puzzles of modern life is that with so much information created daily, people are increasingly consuming more of the same information. Every year, more people watch the same movies, read the same books and cite the same papers than in the previous year. With so many videos available on YouTube, it is a wonder that hundreds of millions have chosen to watch “Gangnam style” video instead.

Figure: Gini coefficient measuring inequality of citations received by APS paper over several decades.
image3

More alarmingly, this trend has only gotten worse. The gap between those who are “rich” in attention and those who are “poor” has grown steadily. One way to measure attention inequality is to look at the distribution of the number of citations. The figure above shows the gini coefficient of the number of citations received by physics papers in different decades. Gini coefficient, a popular measures inequality, is zero when all papers receive the same number of citations, and one paper gets all the citations. Though the gini coefficient of physics citations is already high in 1950s, it manages to grow over the subsequent decades. What this means is that a shrinking fraction of papers is getting all the citations. Yes, the rich (in attention) just keep getting richer.

Figure: Gini coefficient measuring inequality of box office revenues of movies that came out in different years.
image4

Incidentally, inequality is rising not only in science, but also in other domains, presumably for the same reasons. Take, for example, movies. Though the total box office revenues have been rising steadily over the years, this success can be attributed to an ever-shrinking number of huge blockbusters. The figure above shows the gini coefficient of box office revenues of 100 top-grossing movies that came out in different years. Again, inequality is rising, though not nearly as badly in Hollywood as among scientists!

Cognitive Biases

When attention is scarce, the decisions about how to allocate it can have dramatic outcomes. Social scientists have discovered that people do not always rationally weigh alternatives, relying instead on a variety of heuristics, or cognitive shortcuts, to quickly decide between many options. Our study of scientific citations and social media has identified some common heuristics people use to decide which tweets to read or scientific papers to cite. It appears that information that is easy to find receives more attention. People typically read Web pages from the top down; therefore, items appearing at the top of the page have greater visibility. This is the reason why Twitter users are more likely to read and respond to recent messages from friends, which appear near the top of their Twitter stream. Older messages that are buried deep in the stream may never be seen, because users leave Twitter before reaching them.

Visibility also helps science papers receive more citations. A study by Paul Ginsparg, creator of the Arxiv.org repository of science papers, confirmed an earlier observation that articles that are listed at the top of Arxiv’s daily digest receive more citations than articles appearing in lower positions. A paper is also easier to find when other well-read papers cite it. Such indirect exposure increases a paper’s visibility, and the number of new citations it receives. However, being cited by a paper with a long bibliography will not results in many new citations, due to the greater effort required to find a specific item in a longer list. In fact, being cited by a review paper is a kiss of death. Not only is it newer, and scientists prefer to cite more recent papers, but also a review paper typically makes hundreds of references, decreasing the likelihood of discovery for any paper in this long list.

Mitigating Information Overload

No individual can keep pace with the growing deluge of information. The heuristics and mental shortcuts we use to decide what information to pay attention to can have non-trivial consequences on how we create and consume information. The result is not only growing inequality and possible neglect of high quality papers. I believe that information overload can potentially stifle innovation, for example, by creating inefficiencies in the dissemination of knowledge. Already the inability to keep track of relevant work (not only because there is so much more relevant work, but also because we need to read so much more to discover it) can lead researchers to unwittingly duplicate existing results, expending effort that may have been better spent on something else. It has also been noted that the age at which an inventor files his or her first patent has been creeping up, presumably because there is so much more information to digest before creating an innovation. A slowing pace of innovation, both scientific and technological, can threaten our prosperity.

Short of practicing unilateral disarmament by writing fewer papers, what can a conscientious scientist do about information overload? One way that scientists can compensate for information overload is by increasing their cognitive capacity via collaborations. After all, two brains can process twice as much information. A trend towards larger collaborations has been observed in all scientific disciplines (see Wuchty et al. “The Increasing Dominance of Teams in Production of Knowledge”), although coordinating social interactions that come with collaboration can also tax our cognitive abilities.

While I am a technological optimist, I do not see an algorithmic solution to this problem. Although algorithms could monitor people’s behavior to pick out items that receive more attention than expected, there is the danger that algorithmic prediction will become a self-fulfilling prophecy. For example, a paper that is highly ranked by Google Scholar will get lots of attention whether it deserves it or not. In the end, it may be better tools for coordinating social interactions of scientific teams, coupled with algorithms that direct collective attention to efficiently evaluate content, that will provide some relief from information overload. It better be a permanent solution that scales with the continuing growth of information.

Blogger’s Profile:
Kristina Lerman is a Project Leader at the University of Southern California Information Sciences Institute and holds a joint appointment as a Research Associate Professor in the USC Computer Science Department. After a brief stint as a theoretical roboticist, she found her calling in blending together methods from physics, computer science and social science to address problems in social computing and social media analysis. She writes many papers that are greatly enjoyed by all of their twenty readers.