1,400,000 70-dimension histogram vectors about gamer behavior.  What can you do with that?

Me?  Nothing, the math is too much but, if you can make sense of it, a lot of data is there in the ongoing online gaming phenomenon known as World of Warcraft(WoW).   Keeping track of all that data would seem to be solely the purview of computer programmers but sociologists are starting to take notice.   Some people are just goofing off and writing a book about their experiences but others see gold in 'game mining' - the insights of anthropology we can get by seeing what 10 million people do in a virtual, controlled setting over a period of years.

Because World of Warcraft is both persistent and ongoing, the game tracks everything a character does.   For those who want to take advantage of the data that is available, a few months ago at CIG2010 (the IEEE Conference on Computational Intelligence and Games), Christian Thurau, post-doctoral researcher at Fraunhofer Intelligent Analysis and Information Systems (IAIS) gave a tutorial on analyzing such massive data sets.  If monstrous amounts of data can be corralled properly, it could be a huge benefit in other fields.

His idea was to dissect how 'guilds' in World of Warcraft evolved.   Unlike real life historical guilds, World of Warcraft guilds are not craftsmen or people plying a similar trade, but rather social gatherings of people with similar interests or a desire for mutual protection.  Why did they pick that game?   Pure numbers, as this graphic shows:

World of Warcraft users

Now, 1.4 million 70-dimension histograms is not extreme for data but it isn't trivial either so his method uses basis vectors - archetypes - to express each guild as a convex combination of other guilds.  

How can this help sociology?  For one thing, why people join the guilds they do, why they becomes particular races, with their own biological benefits and flaws, and why they actively dislike other races or people in the game has to be of intense interest to the social sciences.   Not to mention the marrying and the killing that can occur.   And there is going to be interest in the benefits of virtual sex if enough data can be found.

The issue then, as Thurau discussed in his talk, will be how to parse out the meaningful basis vectors.   His method of archetype analysis was to create guilds that are most different from each other and then interpolate other guilds as combinations of those.    Archetypes are a bad word in the social sciences because that is stereotyping but stereotypes exist for a reason.   It's not a way to round up and label entire groups but archetypes can contain meaningful levels of societal understanding and that is a way to understand outliers also.

Some conclusions are apparent right away.   Even in his analysis, limited strictly to WoW guilds, was that there was no meaningful difference between the EU and US.   That's right, even more than 200 years after separation from Europe and 60 years after unchecked immigration halted, Europeans and Americans in groups still have much more commonality than difference when it comes to social clusters.

Here's his talk:



And all of the talks from CIG 2010 can be found online here.

More reading on archetype analysis using World of Warcraft:

Christian Thurau, Kristian Kersting, and Christian Bauckhage. Convex Non-Negative Matrix Factorization in the Wild.

Christian Thurau, Kristian Kersting, and Christian Bauckhage.  Yes We Can – Simplex Volume Maximization for Descriptive Web-Scale Matrix Factorization.

Code:

Thurau made his code (also for the large-scale archetypal analysis variant) available in the Python Matrix Factorization Module.