A: Obama's "lipstick on a pig" quip
For the first time, the Web has been used to track and attempt to measure the news cycle, the process by which information becomes news, competes for attention and fades, says the NY Times.
Researchers at Cornell developed algorithms to track frequently repeated short phrases, the equivalent of “genetic signatures” for ideas, or "memes," and story lines on blogs and mainstream media sites over three months (August through October, 2008).
Rise and fall of a star phrase
"As our principal domain of study," the authors say in their paper, "we show how such a meme-tracking approach can provide a coherent representation of the news cycle - the daily rhythms in the news media that have long been the subject of qualitative interpretation but have never been captured accurately enough to permit actual quantitative analysis."
A succession of story lines that evolve and compete for attention within a relatively stable set of broader topics collectively produces an effect that commentators refer to as the news cycle. Tracking dynamic information at this temporal and topical resolution has proved difficult, since the continuous appearance, growth, and decay of new story lines takes place without significant shifts in the overall vocabulary; in general, this process can also not be closely aligned with the appearance and disappearance of specific named entities (or hyperlinks) in the text. As a result, while the dynamics of the news cycle has been a subject of intense interest to researchers in media and the political process, the focus has been mainly qualitative, with a corresponding lack of techniques for undertaking quantitative analysis of the news cycle as a whole.The phrases they track show significant diversity over short periods of time while the broader vocabulary remains stable, the authors note, but they're so abundant that they can be used as "tracers" as the phrase is mutated over time and via dissemination.
From an algorithmic point of view, we consider these distinctive phrases to act as the analogue of “genetic signatures” for different memes. And like genetic signatures, we find that while they remain recognizable as they appear in text over time, they also undergo significant mutation. As a result, a central computational challenge in this approach is to find robust ways of extracting and identifying all the mutational variants of each of these distinctive phrases, and to group them together. We develop scalable algorithms for this problem, so that memes end up corresponding to clusters containing all the mutational variants of a single phrase.A fascinating finding of the study was the interplay between blogs and more traditional news media sources.
Which comes first, the blog or the newswire?
As you may expect, traditional news media has a slight lead time on blogs - but not much. The typical lag time is about 2.5 hours, the authors found (Figure 8).
"The peak of news-media attention of a phrase typically comes 2.5 hours earlier than the peak attention of the blogosphere. Moreover, if we look at the proportion of phrase mentions in blogs in a few-hour window around the peak, it displays a characteristic 'heartbeat'-type shape as the meme bounces between mainstream media and blogs." (Figure 9)
One interpretation is that a quoted phrase first becomes "high volume" among news sources, the author say, and then is "handed off" to blogs where it is discussed for much longer time periods.
Most of the fastest blogs were political in nature, they found (Table 1).
The top three fastest blogs were hotair.com, talkingpointsmemo.com, and politicalticker.blogs.cnn.com. The huffingtonpost.com and dailykos.com were big players too, among others.
Interestingly, large media organizations like washingtonpost.com and ap.org are behind the blogs; these two are about 16 hours behind the fastest blog.
Does it go both ways, though? In fact, it does - while the majority of phrases first appear in news media and then diffuse to blogs, there are also phrases that propagate in the opposite way, percolating in the blogosphere until they are picked up the news media, the authors note. They found that about 3.5% of quoted phrases originate in blogs and trickle into mainstream media, against the grain.
News cycle and species interaction
The authors draw parallels to biology, suggesting there are "interesting potential analogies to natural systems that contain dynamics similar to what one sees in the news cycle."
"For example, one could imagine the news cycle as a kind of species interaction within an ecosystem, where threads play the role of species competing for resources (in this case media attention, which is constant over time), and selectively reproducing (by occupying future articles and posts)," they write.
Similarly, one can see analogies to certain kinds of biological regulation mechanisms such as follicular development, in which threads play the role of cells in an environment with feedback where at most one or a few cells tend to be dominant at any point in time. However, the news cycle is distinct in that there is a constant influx of new threads on a time scale that is comparable to the rate at which competition and selective reproduction is taking place.So, what can do you with meme-tracking?
The tracking "opens an opportunity to pursue long-standing questions that before were effectively impossible to tackle. For example, how can we characterize the dynamics of mutation within phrases? How does information change as it propagates? Over long enough time periods, it may be possible to model the way in which the essential 'core' of a widespread quoted phrase emerges and enters popular discourse more generally."
For now, though, you can have fun playing with their interactive graphs:
Most mentioned phrases during the 2008 U.S. presidential campaign
Top phrases between August 2008 and February 2009
Phrases containing "economy"
Top phrases in the last month of 2008
Graphs, documents, comparisons and much, much more!
Leskovec et al. KDD '09, June 28-July1, 2009, Paris, France.