So the USA lost to Belgium in the World Cup elimination round. I predicted a win for the US for a simple reason - Belgium, I said, does not know how good it is, whereas the US does. 

That's fuzzy logic, right? Well, that is what a lot of sports analysis is, because analysis at its heart relies on subjective scouting. Pundits can pretend to science it up all they want, but they are just doing a Bayes analysis based on real results after they happen. Something like a 68% chance of a victory is useless in the real world unless you are a bookie. It sounds science-y, but sports is a 0 or a 1. Anything in between is a waste of time.

Google Cloud Platform thinks it is Bayes on steroids. For their World Cup predictions, they took data from Opta, built derived features in Google BigQuery, did modeling with Pandas and let the Compute Engine produce an answer. They found, by comparison, that their idea of using a logistic regression was better than the poisson regression other groups use.

It's certainly better than Paul the Octopus. 

Here is what they came up with, so we can see how they did. Since teams will either win or lose, 68% versus 81% is an intellectual placebo, they have to make the call:
  • Brazil vs. Colombia: Brazil (71%)
  • France vs. Germany: France (69%)
  • Netherlands vs. Costa Rica: Netherlands (68%)
  • Argentina vs. Belgium: Argentina (81%)
All well and good, statistically a lot of people will get all of the games correct. Many got the USA game correct. While the USA and Belgium looked close in the final score, it didn't look close on paper. And it wasn't close in the game, goalkeeper Tim Howard set a record for most saves in a World Cup match.

But sports are catching onto real data - the truly predictive kind. What used to be the sole domain of baseball is now commonplace everywhere, every poker hand on television shows odds of success. The NFL, which used to rely heavily on scouting combines, now even quantifies personality traits numerically. Players projected by media to go early in the draft drop to the mid-20s or even the second round because teams are using metrics television commentators are in the dark about.

What about four years from now? DJ Pangburn of Motherboard has some ideas. He isn't correct on everything, he spells Oakland GM Billy Beane's name wrong and thinks the man introduced sabermetrics to baseball, but he sees a glimmer of hope for sports in Near Future Laboratory's Winning Formula project. I see a glimmer of hope for Science 2.0.

Pangburn notes the German Football Association partnered with enterprise software company SAP to bring the company's Match Insights software into its training program. "In just 10 minutes, 10 players can produce over 7 million data points."

That's overkill in football but might be ideal if we want to optimize a wheat genome.

Here's Scott Smith of Changeist talking about Winning Formula. Just transpose that to science and we can see what science might look like in eight years. Why not four? Because science is a large constituency, $120 billion in funding annually, but it doesn't have the rabid fan base of sports. So it will take a little longer for us.