What’s so great about video games?

That’s a pretty silly question really…of course video games are great, does there really need to be a reason? Well, everyone has their own reasons for loving video games, and you wouldn’t be here reading this if you didn’t love them. But I want to talk about why, as a statistician, I love video games.

Video games, and in particular e-sports, have huge potential for data analysis. We can collect vast quantities of data on all sorts of attributes…after all video games are made from ones and zeros, the same ones and zeroes that can be used in an analysis. So all we have to do is collect it and utilise it. In this post, I’m going to χplain a little more about why e-sports are so suited to statistical analysis.

Statistical modelling in sport really took off in the 90s, and quite quickly people began to see the huge benefit from this approach to team management in terms of selecting strategies, team composition and new players. The book (and film) Moneyball – the art of winning an unfair game is a great example of this. Just like in conventional sports, we can make use of statistics in e-sports to help improve our game, and to select the best players (and champions in the case of LoL). However, there are a couple of big advantages that e-sports has over conventional sports in terms of the data that we collect.

Source – California Genealogical Society and Library

Ease of data collection

Every attribute, position and status of a player can (in theory) be programatically collected at any time point during a game, which gives us an almost inexhaustible quantity of data to work with. Of course, all this data is not always available, but some game developers, like Riot, are beginning to understand the value of this and to provide detailed datasets from individual games, allowing teams to make personalised analyses of their own data.

In contrast, collecting a lot of the data from conventional sports requires real people to make decisions and classifications and to record these decisions. This brings us to our next issue.

 

Measurement error and bias

In conventional sports, there is always a certain degree of error when measuring a particular variable (or attribute). The speed of a ball, the position of a player – for some sports many variables like these are recorded using electronic measurements, but even these measurements will always have a certain degree of error. These errors can be random, in which case they reduce the accuracy of our statistical estimates, or they can systematically vary in one direction (e.g. over-estimation of a value), in which case they introduce a bias to the results. Instead of observing the true value X, we instead observe W, which is actually equal to X plus some sort of random or non-random measurement error, U. 

For less objective variables that require classification by a human (such as the type of shot played or the severity of an injury), the potential for measurement error and bias is even greater.

measurement-error-normal-distribution

In video games there can be no measurement error. Any and all data on a game can be recorded exactly as it is in the game, because the data defines the game. This prevents additional random error from imprecise measurements and, more importantly, also prevents biases that can be introduced during measurement. For example, a referee in football may (consciously or unconsciously) more readily give a yellow-card to a player if they dislike them. These types of biases cannot exist in video games, and therefore cannot confuse our analyses.

 

So overall, this is why we love video games. As a statistician having access to large datasets of precise, comparatively unbiased data is a prospect that makes my calculator display error messages of excitement. Having said that, collecting the data is just the first stage of the process. Correctly analysing and interpreting the data and the relationships that exist in it are what really make the difference. This is what we aim to achieve through the models we develop and publish, models that you can run on your own personalised data. We’ll post more about that later though…