Cities with a higher incidence of racist tweets showed more actual hate crimes related to race, ethnicity, and national origin, according to an analysis of the location and linguistic features of 532 million tweets published between 2011 and 2016.

A machine learning model identified and analyze two types of tweets: those that are targeted (directly espousing discriminatory views) and those that are self-narrative (describing or commenting upon discriminatory remarks or acts) and then the team compared the prevalence of each type of discriminatory tweet to the number of actual hate crimes reported during that same time period in those same cities.

The analysis included cities with a wide range of urbanization, varying degrees of population diversity, and different levels of social media usage. The team limited the dataset to tweets and bias crimes describing or motivated by race, ethnic or national origin-based discrimination. Those are categorized and tracked by the Federal Bureau of Investigation, and crimes motivated by race, ethnicity, or national origin represent the largest proportion of hate crimes in the nation.  While most tweets included in this analysis were generated by actual Twitter users, the team found that an average of 8% of tweets containing targeted discriminatory language was generated by bots.

There was a negative relationship between the proportion of race/ethnicity/national-origin-based discrimination tweets that were self-narrations of experiences and the number of crimes based on the same biases in cities. 

The authors say the results represent one of the largest, most comprehensive analyses of discriminatory social media posts and real-life bias crimes in this country, but this is simply statistical, there are no known causal mechanisms between social media hate speech and real-life acts of violence.