Indirectly we can make it count

We’re always trying to get the most out of data that we can. And sometimes that means resorting to using what we call, indirect evidence – we might not have direct information for two points (e.g. players, teams, champions) that we want to compare. But, by using a third point that we do have direct comparisons with, we can infer the relationship between the first two.

Let’s illustrate this with an easy football example – I’ll move on to a League example later, but that’s a little more complex! We have three football teams – Barcelona, Manchester United, and Scunthorpe United. As our direct evidence:

  • Man United has played Scunthorpe before and consistently beats them
  • Barcelona has played Man United before and consistently beats them

However, Scunthorpe has never played Barcelona before. So we don’t have any direct evidence for comparing them…but does that mean that we have to conclude that we don’t know who would win? Of course not – we would say that we know Barcelona would win against Scunthorpe, even if they’ve never played before

One way to formalise this mathematically is by defining a consistency equation:

\delta_{AC}=\delta_{AB}+\delta_{BC}

 

This basically means that the relative effect of A vs B is equal to the effect of A vs C minus B vs C, provided all other things are equal (this is a key part of the assumption behind this formula).

consistency

It seems simple enough, and it is with just three “nodes”. However, it gets increasingly complicated as we build up larger networks of direct and indirect evidence. Below are all the LoL matches played during Worlds 2015. The blue nodes represent the different teams (their size is proportional to the number of games that team played in the tournament). The lines represent that there is direct evidence connecting two teams (i.e. they played each other at least once in the tournament).

Worlds2015_Networkplot

By using indirect evidence, we can get an idea of the outcome between teams that haven’t played (e.g. H2k and CLG). But we can also use the indirect evidence to supplement our direct evidence, and give us more information about the relationship between say, OG and FW. This means we can make predictions about the outcome with greater certainty.

Of course, as I mentioned before, the consistency equations require that all other things are equal…and in League they very rarely are from game to game! But, the things that aren’t equal can, over multiple matches, be expected to be distributed randomly, meaning that our consistency equation still holds. We just have to account for the uncertainty due to the randomness, we which can do during modelling.

This is the type of analysis I use in my job when comparing multiple scientific studies that look at multiple treatments (known as network meta-analysis). It’s how we identify which treatments work, and which ones don’t. Indirect evidence is all around us – we just have to find new ways to make use of it! After all, indirect evidence is just extra information, and information is beautiful.