The highly anticipated return of the MLB All Star game is quickly approaching after its hiatus last year because of the pandemic. Like 2019, the All Star game is implementing a two phase voting system. The first phase required fans to vote for a group of club-nominated players at each starting position and was completed on June 24. The top three players with the most votes at each position (top nine outfielders) in each league now advance to phase two, where fans vote for a single player at each position from the remaining pool of players. The final all star lineup will be announced on July 1 at 9 pm after Phase 2 ends.
This project predicts the All-Star lineup this year using results from Phase 1 and common stats from previous years (games, age, runs, hits, home runs, RBIs, strikeouts, batting average) that were predictive of whether one was an All Star in the past. Using data from Baseball Reference, we compiled data from 2017-2019 of every player’s stats and added data of whether or not they had been an All-Star in that given year. We imported the data into an analytics software tool called Jump (JMP) and, using JMP’s partition function, we analyzed how predictive certain stats were in determining All-Star status.
This is just part of the entire partition but for instance, as shown above, a total of 61 players from 2017-2019 had over 115 hits and over 100 runs during a given year, with 23 of them being voted by fans as All-Star starters during that year. Thus, having over 115 hits and over 100 runs is quite predictive in determining whether one would be an All-Star. However, having over 115 hits and over 100 runs but with fewer than 37 home runs and 79 RBIs is not predictive of being an All-Star, and therefore players with those stats from 2021 (all these metrics were divided by two when we carried them over to 2021 since we are only halfway through the season) are eliminated from our list of possible All-Star predictions. In this way, we used this partition tree and its given stat metrics and filtered through the large data set of players from 2021 and eliminated players that failed to meet certain criteria of the partition. In the end, we had 63 players on our list that had passed through the partition and 25 of them ended up being part of the 51 finalists for the 2021 All Star game.
A small flaw with the partition was that it failed to account for injuries as several star names did not make it onto our list. For instance, Mike Trout, who is 4th in the league with votes (sitting at just over 2 million) failed to make it far in the partition as he only played 36 games and failed to acquire the 58 hits needed (115 hits/2) in the partition criteria. Likewise Bryce Harper, who has been forced to miss over 20 games due to injuries, also failed to rack up the number of hits needed to pass through the partition. The partition also only focused strictly on the player’s performance this year and did not account for how that player had done in previous years. For example, take Juan Soto. Juan Soto, who is notoriously known for his bold demeanor and for carrying the Nationals to their World Series victory in 2019, has been significantly underperforming this year in comparison to his incredible 2019 season. Under the partition criteria, Soto is lacking in runs, batting average, and RBIs while in 2019 he surpassed every statistic easily. In fact, the outfield players that successfully passed through the partition (ex. Raimel Taipas, Teoscar Hernandez, Bryan Reynolds) significantly surpass Soto in every major statistic despite only playing a few more games than him. Yet many of these players are nowhere close to having as many votes or as much popularity in the baseball world as Soto. Thus, this example shows the emphasis that the public places on past performance and/or popularity rather than on performance from the present season in their voting decisions.
Using the results of our partition and the results from phase 1, here is our prediction for the 2021 All Star starting lineup:
- Catcher: Buster Posey**
- 1B: Freddie Freeman
- 2B: Adam Frazier*
- SS: Fernando Tatis Jr.
- 3B: Kris Bryant
- OF: Ronald Acuna Jr., Nick Castellanos, Jesse Winker
- Catcher: Salvador Perez
- 1B: Vladimir Guerrero
- 2B: Marcus Semien
- SS: Xander Bogaerts**
- 3B: Rafael Devers
- OF: Aaron Judge, Adolis Garcia, Teoscar Hernandez
- DH: Shohei Ohtani
** None of our predictions for this position and league were correct, but out of the three predetermined finalists, this player made it the farthest through our partition. The number of votes a given player received played no role whatsoever in our predictions.