Thursday, 19 June 2014

A brief introduction to statistics and football

Time for another non-archaeological post!

I've noticed a trend this year, particularly on the BBC, to rely increasingly on "more sophisticated" statistical analyses as part of their commentary. Don't get me wrong, statistics can be useful but never have statistics been so obviously abused. It was about time that the book of "lies, damned lies and statistics" was thrown by the licence fee payer (i.e. you) at the BBC's sport section, and tell them off for making generalisations that either don't exist, or are just plain wrong.

N.B. I am not a Manchester United, Sunderland, or Chelsea supporter. Make what you will of it, but my good friends will be able to tell you that I support Coventry City Football Club. Having now stated my biases, lets get on with the stats.

So you have always had these fairly "basic" stats in the game, to do with possession, shots, fouls, and so on:

Match stats

Which are flawed in their own way, but fine, I can live with that. They're the obvious parts of a game. These are only used as a general indicator by most commentators, and don't rely on space as an argument (except for corners).  But now you're seeing this, which is my main concern:
Average position of players in Chelsea v Manchester United

Key to average position graphic

What is wrong with the mean in space (I hear you cry)?

The above map shows the average positions of the Chelsea players (on the left) and Manchester United players (on the right), based on something known as the central mean. the central mean, objectively speaking, is the sum of all of the positions the player was in, in a given space of time (e.g. the full 90 minutes, all the time before they were substituted on 75 minutes etc.), divided by the total number of positions they were in. Confused? Don't panic! Imagine you have a graph of x on the bottom, y on the top, and you work out the mean of the x and y separately (so as not to confuse them) e.g:

So then you reference them on the same map, giving you lines. Then you find where they meet, et voila! The mean centre has been found for that player over the time period. But, what does this actually tell you about each player? Does this mean that they stood rooted to the spot? Of course not (unless they are statues). The next map below illustrates the point about movement better:

Touches against Man Utd and heat map over last 10 Chelsea games

Key to touches and heat map
This heat map shows every touch made in the game made by two Chelsea players, and where they where as a percentage of the game. Look how dispersed the movement is! It's not just an up/down motion, it's sideways, it's all over the final third. In truth, it's similar to a lot of heat maps that you will come to see of a lot of attacking players in their positions. It would be interesting to see the mean centre of all those passes (and direction), although that would be even less useful than the mean centre of the player.

But where do these passes go? There are also maps to show the short and long passes, and where they were done in relation to the rest of the field. You can even tell which player they passed to the most, and who passes to who the most, and infer the in-field relationships from this.

But, ultimately, none of this actually tells you WHY a goal was scored, or a shot was blocked, and all the other football related reasons for winning or losing a game. Sorry, but there is no magic formula to win a game. Just hard work, being good, eating your vegetables and knowing far more about football than I do. These will all contribute to you scoring more goals than doing statistical analysis of every game and where they went wrong by using statistics alone. Also it will make you feel better that you can get more shots on goal by simply putting the practice in than just computing the most likely outcome.

Coming back to the problem of central mean, why is it a "forced marriage"? Means were not designed for space, only for quantities and other non-spatially defined stats.

Using statistics as the crux of his argument, you can see that he was right, but only in some respects, and not really being aided by his own diagrams. Yes, the Chelsea defence are relatively deep. But the statistics also show that Manchester's attack was actually in similar, deep, positions. So does this show that their pace was neutralised? Not really. The statistics show more likely that the offside trap was being used, and this is not a sign of pace, more tactical and player awareness. Furthermore, both sets of players are likely to be running around, for corners, free kicks and so on, so this will "skew" the results, so the mean may not actually reflect the "true" position of the player at all!! To quote Savage from the same article; "United were all over the place positionally for the second goal after clearing Chelsea's initial corner." This will affect the mean, but furthermore this incident that led to a goal is in no way reflected by the central mean. The speed is beneficial if you want to beat the offiside trap, but that's only half the battle.

So part of the problem is trying to use statistics to explain your point, and knowing how to use it. None of these stats can actually depict the game's movements. It is better for the commentator describe to you what caused the goal, or even better, go watch the actual game itself, if you want to know how to win games and have entertaining football!!


Statistics should only be used to explain the team game; the more components you add to a statistical analysis, the more it relates to the whole, and less to the individual players (if that makes sense). It should be noted that statistics are useful, but not as an argument unto themselves.

All images from League cup semi final between Manchester United and Sunderland, 22/01/2014,, or from analysis of premier league game between Chelsea and Manchester United, 19/01/2014,