The game of soccer is an interesting one. A team can outshoot the other and have far better chances, but end up losing. Likewise, a team playing poorly can have one big chance in the whole match and use it to score the game’s only goal. As we have seen so far in the 2018 FIFA World Cup, there have been many outcomes and statistics that we have not expected. How can we find a way to model the tournament to find teams with the best chances of winning? There are a couple measures that can give us a good sense of how likely team A should defeat team B based on data collected by previous international games. One such method can find the expected number of goals a team should score against a certain opponent. Based on the expected value, we can use Poisson distributions to find the probabilities of a team scoring a specific number of goals and use that to find win, loss, and draw percentages.
International teams are relatively tricky to evaluate because they play few games throughout a long cycle of 4 years between World Cups. Unlike a regular yearly season, there is much more turnover between players and team personnel. Thus, the sample size of predictive games is smaller than desired. In order to create probabilities for the third round of group stage matches, I took into account the country’s previous ten games against top 80 national teams, including the first two World Cup matches. Using simply goals scored per game is not enough, because the strength of a team’s defense can influence how many goals are conceded. So, we want to normalize the amount of goals scored and goals allowed per game by each team by comparing how it stands against the average for all 32 World Cup teams. In their last 10 games before the final group stage game, World Cup teams scored 1.425 goals per game and allowed 1.1375 per game. The ratio of a team’s goals scored and allowed to the average produces Offensive and Defensive Strength values. An offensive number over 1 signifies an above-average World Cup offensive team and a defensive number under 1 is given to an above-average defensive team. For example, Spain’s 2.5 goals per game translates to a 1.75 offensive strength number and .9 goals allowed per game results in a .8 defensive strength number. This is how we can determine how a defense can affect an offense’s scoring results. To find the amount of expected goals of an offense against a defense, we multiply the offense’s value by the defense’s value times the average goals per game of the last ten games played by each World Cup side (1.281) , which is the expected goals scored in 90 minutes between a team’s offense and its opponent’s defense. We do the same thing for the opponent’s offense and the first team’s defense to find the opponent’s expected goals.
Now that we have the expected goal values, we can use the Poisson model to determine the probabilities of scoring a certain number of goals. Given an average number of occurrences, we can find how likely any integer of occurrences is to happen using the binomial distribution of outcomes. The probability P of n goals scored given an average of x expected goals per game (with accounting for opponent’s defense) can be determined by the equation
P = (xne-x)/n!
This table below shows probabilities that a team scores a certain number of goals in a hypothetical match between the nations. Each combination is found by multiplying the probabilities of the specific outcomes. It is found that scoring more than five goals has an extremely negligible probability.
|Russia 0||Russia 1||Russia 2||Russia 3||Russia 4||Russia 5|
From here we can find match predictions , such as the expected score, which is the highest probability in the grid, the percentage of a draw( the sum of all the probabilities on the diagonal), the percentage of a Russia win, (the sum of all probabilities northeast of the diagonal, and the percentage of a Uruguay win, (the sum of all probabilities southwest of the diagonal) These outcomes can help one determine a strategy for wagering and analyzing matches and teams.
However, there are a couple limitations of the Poisson distribution predictions for soccer matches. For one, these outcomes cannot take injury, yellow/red cards, or tournament scenarios into account. In the last round of the group stage matches, many teams will not use their normal play style and play for a draw in order to qualify or rest/go easy on their star players if the squad has already qualified for the next phase of the tournament. In that regard, the probabilities may be skewed. One very important limitation is that of defense. A very “strong” defense under .5 could destroy a team’s expected goals value, which is unrealistic especially when using a smaller sample size. An offense scoring 2 expected goals a game matched with a defense allowing .2 expected goals per game should have a larger expected goals value than just .4. In other words, a great offense should not be severely penalized by a great defense, which gives unrealistically low expected goal values and could predict an amazing defensive team to go all the way regardless of their offense. So, I capped the defensive ratings to go no lower than .5.
For the third round of group play, the highest win/loss/draw probability matched the result 7 times, and the lowest probability matched the result only twice, one of them due to playing for a draw and the other an improbable upset of South Korea over Germany which shocked the entire globe. Using the model for the now-complete knockout stage, it favors Brazil to score a narrow victory in extra time over England in the final. If that’s the prediction, we’re in for a thrilling last two weeks at the World Cup.