Our most beloved works of fiction hide well-trodden narratives - people want them, people expects them by now - and big data analysis can determine them. And most fictions is based on far fewer storylines than you might have imagined.

To come to this conclusion, big data scientists have worked with colleagues from natural language processing to analyze the narrative in more than a thousand works of fiction. By deconstructing some of the magic of narrative in fiction books, they have also confirmed that there are six different, common ways of telling a story that can be found time and time again in popular stories. They were inspired by the work of US fiction author Kurt Vonnegut, who originally proposed the similarity of emotional story lines in a Masters’s thesis rejected by the University of Chicago.  

The authors selected 1,327 books, representative of English works of fiction, from the 50,000 books included in a major open access literature digitization project called the Gutenberg project. They then applied three different natural language processing filters used for sentiment analysis to extract the emotional content of 10,000-word stories.

Annotated emotional arc of Harry Potter and the Deathly Hallows, by JK Rowling. Credit: 10.1140/epjds/s13688-016-0093-1

The first filter—dubbed singular value decomposition—reveals the underlying basis of the emotional storyline, the second—referred to as hierarchical clustering—helps differentiate between different groups of emotional storylines, and the third—which is a type of neural network—uses a self-learning approach to sort the actual storylines from the background noise. Used together, these three approaches provide robust findings, as documented on the hedonometer.org website.

Andrew Reagan from the University of Vermont, and colleagues  thus determined that there were only six main emotional storylines. These include ‘rags to riches’ (sentiment rises), ‘riches to rags’ (fall), ‘man in a hole’ (fall-rise), ‘icarus’ (rise-fall), ‘Cinderella’ (rise-fall-rise), ‘Oedipus’ (fall-rise-fall).

This approach could, in turn, be used to create compelling stories by gaining a better understanding of what has previously made for great storylines. It could also help teach common sense to artificial intelligence systems.

Citation: A. J. Reagan, L. Mitchell, D. Kiley, C. M. Danforth and P. Sheridan Dodds (2016), The emotional arcs of stories are dominated by six basic shapes, Eur. Phys. J. Data Science, 5:31, DOI 10.1140/epjds/s13688-016-0093-1