This will be my last blog post before exams sadly :( . Have 5 exams coming up in 2 weeks, so I will be prioritising my time for the month. My next post will likely be about the PhD topic I’ve chosen (we find out roughly after exams.)
Usually I try to avoid writing about events I’ve attended as I’d have a feeling that I’d make them sound worse than they were. But our STOR-i trip down to ATASS was extremely interesting, so I decided that I’d break my rule. ATASS is a company which applies statistical modelling to sports. They come up with creative solutions that blow existing methods out of the water. From what I gather the company started after one of the founders came up with something which is relatively simple but good called the “Dixon-Coles model”.
Idea behind the model
This section can be easily summarised in the meme below:

sports meme
Pay careful attention to the picture as it’s much more informative than you think. Suppose the extremely intelligent individual being interviewed in the meme was the captain of the home team. Translating the meme we get the following aspects
- We Sports our best from the top right breaks down into two aspects in the bottom left
- We need to stop the other team from scoring points makes a parameter for the model \(\alpha_{home}\), saying how strong the home team attack is.
- While we ourselves score many points. makes a parameter \(\beta_{home}\), saying how strong the home defence is.
- The other team was sportsing too
- similar to the home team the away team get their two parameters, \(\alpha_{away}\), \(\beta_{away}\).
- Sadly the meme failed to invent the last parameter, \(\gamma\) which adds a strength bonus to the home teams parameters as their likely to preform better.
These parameters are then combined such a way that they form two Poisson rate parameters, one for the rate at which the home team will score goals; one for the rate the away team will score goals.
Transforming this to a classification problem
So essentially from here what you want is to obtain probability that the home team will win, draw or lose. To do this we consider the two random variables \(H\) and \(Y\), home goals and away goals respectively. These random variables use the rate parameters we previously looked at. Then you find the following 3 outcomes.
- Home win \(p_{win}=P(H>Y)\)
- Home lose \(p_{lose}=P(H<Y)\)
- Draw \(p_{draw}=P(H==Y)\)
Essentially this gives you something similar to what we had in the LDA model where the outcomes or classes have a probability associated with them which we can then predict with.
Odds
Seeing the whole motivation behind the model is that it beats book-makers, we need to transform these probabilities into odds or something you can bet with. Take \(p_{win}=0.1\) for instance.
Suppose we bet £10 on home winning, the book maker gives us odds \(k\) which means we get \(10\times k\) back if we win. If the book maker is fair we’d find that \(0.1\times k=10\), so \(k=100\).
Essentially what we want is to beat the bookies. To do this we’d need \(k>100\). If \(k<100\) the bookies are expected to make money off of us.
You never beat the bookies…
Problem with odds is that nobody actually knows \(p_{win}\) and book makers generally know better than you; and they give more conservative odds. This results in the bookies almost always having an expected gain over you.
Stats won for once!
The Dixon Coles model actually was a model which beat the book makers back when it was introduced… but now it’s public so it’s unlikely that it’s better as people will have improved it and kept their secrets.