The 2018 FIFA World Cup begins next week. In an era of sports analytics, legalized gambling, and online bracket challenges (thanks March Madness), it’s time for many of us to sit down and make our own World Cup predictions.
Motivated by an off-the-cuff ranking exercise that I assigned in my undergraduate course on soccer analytics earlier this Spring, I describe a ranking method that results from focusing on the country that I want to do well at the World Cup (ie wishful thinking), no matter which country that is.
Before we start, we consider some readily available rankings. We only use objective rankings such as FIFA’s Official ranking, the ELO ranking which is based on the way chess players are ranked, Massey’s ranking which used to be part of the College Football Bowl Championship Series ranking, a betting-line aggregator, and UBS Bank’s ranking.
We only rank the countries that made the World Cup, so the worst-ranked country is always #32. We include the country’s group stage outcome, so B2, means the country came in second in Group B. According to these five ranking methods, if the better-ranked country always wins, then we would expect the the quarterfinal bracket to look like this.
|FIFA||ELO-Football||Massey||Betting lines||UBS Bank|
|QF||*Spain, #9, B2||*Portugal, #6, B2||*Portugal, #8, B2||*Portugal, #8, B2||*Portugal, #8, B2|
|#1||France, #7, C1||France, #4, C1||France, #6, C1||France, #5, C1||France, #5, C1|
|QF||Brazil, #2, E1||Brazil, #1, E1||Brazil, #1, E1||Brazil, #2, E1||Brazil, #2, E1|
|#2||Belgium, #3, G1||England, #7, G1||Belgium, #5, G1||Belgium, #8, G1||England, #4, G1|
|QF||Portugal, #4, B1||Spain, #3, B1||Spain, #3, B1||Spain, #3, B1||Spain, #3, B1|
|#3||Argentina, #5, D1||Argentina, #5, D1||Argentina, #4, D1||Argentina, #4, D1||Argentina, #7, D1|
|QF||Germany, #1, F1||Germany, #2, F1||Germany, #2, F1||Germany, #1, F1||Germany, #1, F1|
|#4||Poland, #8, H1||*Belgium, #8, G2||*England, #7, G2||*England, #7, G2||*Belgium, #6, G2|
Note: Countries denoted (*) are runners-up in the group stage who defeated a worse-ranked group winner in the round of sixteen. For example using FIFA, *Spain, #9, B2, the runner-up in Group B, would be expected to beat Uruguay, #13, A1, the winner of group A, in the round of sixteen.
Even though these five methods (which I briefly describe at the very botton) consider different data points and output different ranking lists, the sets of teams predicted to play in the quarterfinals are identical in four out of five of these – with the outlier still overlapping considerably (seven out of eight countries). For full disclosure, I did not expect these rankings to produce such similar sets; I chose them because they were easy to find on the internet and didn’t realize the similarity until I was assembling the table.
Now, let’s suppose that our favorite soccer country is not one of the eight listed above. Let’s try to create a ranking method that (shamelessly) favors our country, whichever that one is.
To this end, one day towards the end of the semester I had my students try to find the most favorable ranking method for each of the 32 countries that qualified for the World Cup. We used a ranking tool called FIFAFoeFun!, which allows you to customize your own ranking method. (For full, disclosure, I was part of the team that put FIFAFoeFun! together, and my collaborators, Tim Chartier, Mike Mossinghoff, and their students, are all from Davidson College, which hosts this blog.)
That day, we were able of find “wishful thinking” ranking methods with 19 different #1 seeds. From the remaining 13 countries, their most favorable ranking methods that we were able to find put 3 of these countries #2, 5 of these countries #3, 1 country #4, 2 countries #7, 1 country #8, and 1 country #10. (Our ranking results are by no means exhaustive, and we remind the reader that this was a class exercise, not a formal research project.)
This led us to ask: if every country is given a chance to come up with their most favorable ranking method, what would happen, in aggregate? That is to say, what does the wisdom of the (very biased) masses tell us about the 2018 FIFA World Cup?
The eight countries expected to play in the quarterfinals are
|Wishful Thinking||QF||*Portugal, #7, B2|
|#1||France, #5, C1||QF||Brazil, #1, E1|
|#2||Belgium, #6, G1||QF||Spain, #4, B1|
|#3||Argentina, #2, D1||QF||Germany, #3, F1|
|#4||Colombia, #8, H1|
practically the same set as before!
Even when we try to bias the ranking methods in the most self-serving sort of way, if we combine all 32 “wishful thinking” lists, then it is possible to come up with essentially the same set of countries making the quarterfinal round as everyone else is getting!
On the one hand it is comforting to know that most of the countries we expect to do well will, most likely, indeed, do well at the 2018 FIFA World Cup. On the other hand, history tells us that two or three of these teams will falter, either by not winning their group or by not even making it to the knock-out round. The challenge to winning your bracket is correctly predicting who will be those two or three countries, how they will falter, and of course, predicting who will win it all.
Enjoy creating your World Cup bracket!
Here are brief descriptions of the ranking methods used. More information can be found at the links we included above.
To obtain their ranking, FIFA weighs the match results (win, loss, tie, shootout win) with the match’s importance (eg Friendly, World Cup Qualifier, World Cup Match), how long ago it was played (“date”), the opponent’s ranking and both team’s confederations. Depending on these factors, a win can be worth anything from about 80 to 2400 points.
ELO, which is based on the official ranking method for chess players, considers the margin of victory, the importance of the match, and adjusts for home field advantage. For each match, points are assigned based on how similar the actual outcome is from the expected outcome. So, for example, in the event of a tie, the worse-ranked team gains points and while the better-ranked team loses that same number of points.
Massey’s main data points are score, venue (to account for home field advantage) and date, though matches can be weighted based on importance as well. All the teams’ ratings are computed simultaneously, which allows us to tie in teams’ strength of schedule. This computation is a nice application of linear algebra, and as we mentioned before, it was part of the College Football BCS Ranking method from 1999-2013.
Betting houses/bookmakers are not interested in predicting the outcome of the match with their betting-lines. Instead, they are interested in making the most money, which will happen if half of the money bet falls on one side of the betting-line, and half on the other. There is also a psychological aspect to betting that makes people more interested in certain games than others; I will not get into this.
UBS (and other banks) often use Monte Carlo simulations. These include random variables to quantify the chance of an upset. Their model might also consider known metrics that need not be directly related to soccer, such as the country’s GDP. Then the banks run several thousand simulations, and count how many times each country came in 1st, 2nd, 3rd… etc. Although I do not have a copy of UBS Bank’s World Cup document, it was widely reported in May. (See http://www.businessinsider.com/who-will-win-the-world-cup-2018-2018-5).