March Methods Madness at the 2020 MIT Sloan Sports Analytics Conference

Since introducing the Research Paper Competition in 2010, the MIT Sloan Sports Analytics Conference has become one of the biggest stages for statistical research in sport. In this post, I review the methods used by the finalists and reflect on what this could suggest about current trends in statistical analysis in the sports industry.

Since 2010, at that time of year when many sports fans are preparing their brackets for March Madness, a handful of sports nerds have prepared talks to present as part of the Research Paper Competition at the MIT Sloan Sports Analytics Conference (SSAC). This week was no exception, though a number of events have made the 2020 edition of MIT SSAC strange.

The continued spread of COVID-19 didn’t result in the cancellation of the conference but it did prevent some speakers, and an unknown number of attendees, from appearing. For those who did attend, there was a general sense of anxiety and even greater awkwardness among already awkward technically-minded attendees unsure how to greet each other, which had to have hampered some conversation.

Another factor that, for those of us who are looking for more technical rigor at the event, was perhaps an even greater disappointment with SSAC this year was the absence of Luke Bornn among the finalists. Last year, Bornn announced it would be his last as a participant in the research competition. And reading over this year’s finalists made the ramifications of that decision clear.

But SSAC research has always been a mixed bag: with many highlights among some lowlights. So what were methodological highlights from this year and what could have some use for tennis performance analysis?

One of the papers with the strongest theoretical grounds this year came from two students from Stanford, Evan Munro and Martino Banchio. They were interested in improving a draft allocation policy to discourage tanking, a phenomenon that has been most common in the NBA where draft allocation is a function of end-of-season team rankings. Munro and Banchio derived a mathematical model for team objectives about allocating effort during a season that considers the eventual impact on possible draft picks. There are a lot of simplifying assumptions involved but I commend the attempt to formalize the process. It also made me think how this kind of model of incentives could have certainly avoided embarrassing choices with tennis tournament designs, like the handling of forfeited rubbers in the revamped Davis Cup.

Interested in how players move in matches? Neil Johnson of ESPN argued that off-the-shelf computer vision tools in python can be stitched together to get player tracking from single-camera broadcasts. There was no real model building as part of this project but more of a pipeline model for how to assemble available tools to get new data about player movement from video. Clearly, such a tool would have relevance to all sports. But anyone who has ever tried to install and run a model in tensorflow knows that there is a massive gap between open-source code and getting the results we want.

Despite all of the hype over deep learning, much of the architectures have been based on image data and this has limited their use with positional data. Researchers have most often gotten around that by converting positional data into a picture. But what if we could feed positional data directly into neural nets? This is exactly what Michael Horton of SportsLogiq set out to do in Learning Feature Representations from Football Tracking. I am a bit skeptical about the feasibility of automatic feature representation for multiple sports, some domain knowledge would seem necessary especially with the difficulty of training and understanding what these kinds of models are doing. But I love the ambition of the idea and it is the approach I am most excited to try to implement myself.

There were three baseball-specific papers among the finalists and each were addressing questions that were very specific to this sport. One was an impressive work by a team of undergraduate students University of Rochester’s data science program. In their paper, Measuring the Impact of Robotic Umpires, the authors consider a future world where baseball umpires are replaced by robots. Although this would be a dystopia for current officials, it would be a world where more calls would be right (about 10% of pitches in baseball are miscalls). The students use pitch-level data with information on misclassifications and combine this with league-averages on the expected run value of each pitch outcome to assign an expected change to the value of turning historical missed calls into correct calls. There is a direct parallel to automatic line-calling in tennis and it also make me think about how cool it would be if we could associate individual events in a tennis point to a point value.

The Pulling Starters paper by Daniel Stone, Brian Mills, and Duncan Finigan used a regression approach to assess the call to leave a starting pitcher in the game or not. I love papers that put a direct focus on decision making. But these papers often remind me of how hard evaluating decisions is. The reason for that comes down to decision assessment really being about causes and effect. And causal analysis is difficult. In this case, the authors were dealing with a decision that takes a whole variety of forms, as the time when a pull happens can theoretically be any time point in a game. This makes the treatment and comparison very hard to pin down and I don’t think a regression analysis really addresses this fundamental aspect of the problem directly.

The last baseball paper was a proposal was on a strategy for arbitration. This had no actual analysis, which I would have thought would disqualify the paper. But so it goes at 2020 SSAC.

Clustering applications have been a mainstay for classifying playing style and position types at Sloan. This year had a another example of this by Samuel Kalman and Jonathan Bosch in their NBA Lineup on Clustered Player Tendencies. The authors were motivated by the increasing notion of “position-less” players in the NBA. This seems a really strong case for analytically-derived types given that historical qualitative position labels may be losing their meaning. It was also another example of how box score data can still be relevant in the era of tracking data in sport.

Team chemistry continues to be an elusive concept in analytics. But maybe we are getting closer to ways to quantify chemistry thanks to the work presented by Lotte Bransen and Jan Van Haaren of SciSports. Their idea was to extend well-known ratings models, like Bradely-Terry, into the components of two players. The ratings are parameters that attempt to explain value of individual actions in a soccer match, what they call VAEP ratings. They regard the total value of an action as a function that is the sum of these rating pairs. Focusing only on player pairs and the assumption of additivity may be simplifications worth relaxing, but it seemed a useful initial framework.

The papers wouldn’t be complete without another example of a deep learning model. This came from a group from Stats Perform who wanted to predict shot types in cricket. In, You Cannot Do That Ben Stokes, the authors trained a deep LSTM on Opta ball-by-ball data that includes labels of shot type, shot launch angle, and the orientation of the fielders, and other match features. They emphasize the “personalization” of the model which is achieved by including average tendencies of the batter and bowler as features in the model.

All-in-all there were not any mind-blowing takeaways, especially for tennis analysis. It was clear that SSAC research continues to put emphasis on more machine learning approaches (with varying levels of depth) and what we can think of as “staking” these approaches. So the little slice of SSAC that is focused on technical research is increasingly looking like an ML event. If that is the trend, I would really like SSAC to adopt some other practices of ML conferences like the encouragement of including code with their submissions. This would not only help us differentiate what really works from what is just hype, but it would also help the sports analytics community to better understand and further improve on these methods.