Last week, the sports analytics world descended on Boston for the annual MIT Sloan Sports Analytics Conference (SSAC). Co-founded by Daryl Morey, general manager for the Houston Rockets—a team that, during his tenure, has shaped its identity around its relentless and often innovative use of stats—SSAC has become one of the most popular events for academics and hobbyists to share cutting edge research in sports data science.
Sounds really nerdy but the analytics revolution in sport has made the event a must-do for a growing list of sports celebs, including Billy Beane, John Urschel, and Sam Hinkie. Still, if you are like me, what’s best about Sloan isn’t the glitz and glamour but the hard numbers and exciting new tools statisticians are using to tackle sport’s toughest questions.
In 2016, the tenth year of the conference, tennis smashed SSAC. The sport that was voted 2nd to worst in its use of analytics in 2012 by ESPN had two papers appears in the highly selective research competition. And one of those, ‘The Thin Edge of the Wedge’ by Wei and colleagues, took top prize.
This year it was back to the same old and there were no research papers explicitly about tennis. Not a great result for the sport but tennis is still a relative newbie on the analytics front. And, as I like to look for silver linings, I think we can find a lot of directions and inspiration for future tennis work from the advances presented by other sports at the conference this year.
Below I’ve highlighted three research advances to come out of the 2017 SSAC and why I think they have the most relevance to tennis.
‘Increasing 3-Point Shooting Percentage’
In ‘A data-driven method for understanding and increasing 3-point shooting percentage’, Rachel Marty from UCSD and Simon Lucey from Carnegie Mellon use sensor data on over 1 million basketball shots from the 3-point zone to try to figure out how to improve shooting percentage for the most valuable shot in the game. Despite a shift in 3-point attempts in recent years that some have called a revolution for the NBA, the authors point out that the 3-point shooting percentage has remained ‘stuck at 35% for 20 years’.
So how could NBA players shoot more effectively? The authors identify four ‘actionable’ factors related to the features of the ball trajectory: the angle median, angle consistency, depth median and left-right consistency. The authors sussed out these factors by defining a ‘guaranteed make zone’ (GMZ), a depth from the hoop that resulted in a 90% shooting percentage, and finding the variables that most strongly associated with the GMZ. With these attributes of good shot-making identified, the authors go on to describe how tracking of these factors in training combined with instant verbal feedback could help players shift their shooting style into a more effective range.
As a tennis researcher, this paper stood out to me because of its focus on shot effectiveness. In the same way that Marty and Lucey ask what modifiable factors are the most important for making or missing a 3-point shot in basketball, one could ask what physical characteristics distinguish a good from a poor shot in pro tennis?
We often have a sense when a player has made a poor shot choice but what exactly explains this gut feeling? Is speed the issue or is spin a bigger factor? Is a winning shot more likely to have depth or be closer to a sideline?
These are questions that tennis tracking data could help to address. However, unlike with basketball where the scoring of the outcome of a shot is clear (it is either out or in), how the outcome of a tennis shot should be scored is less obvious. Do we treat a winner as a better shot than a shot that forces an error? Should all shots that are in play be treated the same?
The problem of how to score a shot outcome in tennis is a good example of the challenge of adopting research methods from other sports. No method will be a one-size-fits-all. A large part of the hard work of tennis research is not only identifying applicable methods but knowing how to tailor them to suit the nature of the game.
Another research paper at this year’s Sloan conference was also interested in NBA shooting (no surprise there). While Marty and Lucey focused on the physical characteristics of the shot in relation to the hoop, Panna Felsen of UC Berkeley and Patrick Lucey from STATS looked at a completely different set of shooting factors: player body position.
Using player pose information for 1,500 3-point shots in the NBA, Felsen and Lucey were able to explore the ‘anatomy’ of a 3-point shot. You might think that the only way the authors could get such “pose” information—17 positional attributes of the limbs of a player’s body—was manual coding with an army of Berkeley undergads and a large supply of Redbull. But actually, machine learning models for pose capture from single-camera video has advanced far enough that it can take the place of human coders. The actual pose extraction method the authors used is one created by a group at the Robotics Institute at Carnegie Mellon called ‘Convolutional Pose Machines’. When applied to NBA broadcast, it produces these marionette-like representations of plays that is eerily artful and extremely useful for advancing on the information SportsVU provides.
With the pose data, the paper ‘Body Shots: Analyzing Shooting Styles in the NBA using Body Pose’ uncovers many details of successful shots. For example, in difficult shot situations, players who took a right foot step just prior to the shot and did not run prior to the shot had a higher shooting percentage. The authors also looked at the distinguishing pose characteristics of the poster child for 3s, Steph Curry. They found he was much more likely to be moving and off-balance when making a shot than other players.
‘Body Shots’ was one of my favorite papers of the conference for two main reasons. It addresses the 400 lb gorilla about sports tracking data which is that it still doesn’t tell us everything we want to know. And, they go on to apply a generalizable method for extracting more information about player body position from video footage alone. The fact that no special camera system or wearable sensors could provide such rich information about player body movement should make any sports statistician giddy with the possibilities.
What does it all mean for tennis? The possible applications are really too long to list. But to consider just a few things, suppose the debate over the better backhand— Wawrinka or Federer— could be addressed with data about each player’s arm, leg and torso position for every match they every played? Or what if we could breakdown the evolution of the mechanics of Nadal’s forehand by looking at how his body position generates so much spin? What if pose information could tell us which players are at greatest risk of a major injury? These are just some of the questions that methods like CPM could start to help tennis researchers tackle with larger data sets on body movement information than the sport has ever seen.
The two papers we have looked at so far both look at the events around a shot but don’t really consider the temporality or multiplicty of events. Suppose a player is stepping into position to make a 3-point attempt and a teammate at the same time cuts away to prepare for a screen. Would we expect that to be a more effective shot than one with a shooter undefended? I don’t know the answer but I hope it makes the point that this is a very different kind of question than any addressed in the two papers discussed above because it is concerned with concurrent events and event sequences.
In ‘Possession Sketches: Mapping NBA Strategies’ Andrew Miller of Harvard University and Luke Born of Simon Fraser University, both part of the XYResearch team, present a method that attempts to breakdown the event structure of NBA possessions using player tracking data. Two key ingredients to the methodology they present is the identification of ‘actions’ using functional data clustering methods and the possession modeling using text analysis techniques.
The clustering step is to find similar types of actions (based on a few seconds of tracking data) without making assumptions about what an action should be. You would expect a screen that starts and ends in approximately the same locations on the court would be one type of action. This method not only finds those and all other distinguishable action types, but it provides a way to visualise them in 2D.
The most ingenious step of the paper is its possession model. Miller and Bornn recognize that the concept of a ‘pattern’ in the NBA— a temporal sequence of events in space that frequently occur together— is not that dissimilar to the concept of ‘topics’ with documents, where models are now well established. Why couldn’t the textual models then work for identifying NBA patterns? That is exactly what Miller and Bornn wanted to find out.
Using a ‘bag of words’ model with pairs of simultaneous actions in the NBA, Miller and Bornn create a dictionary of possession types. A major strength of the approach is its searchability. For instance, one could query the two most common possessions of the Cleveland Cavaliers or the three most different three-point shot patterns.
The implications of this work for tennis should be pretty clear. Just as Miller and Bornn breakdown possessions into player movement that occurs close in time, one could define ‘actions’ in tennis by the combination of how players prepare for a shot and how their opponents prepare to receive. Considering both the shot and player trajectories at the same time could show which return position is the most effective for receiving an inside-out forehand from the deuce side wide, the most common serve + 1 shot after a serve down the T to the Ad court, or which player in the top 100 has a shot pattern most like Roger Federer’s. ‘Possession sketches’ presents a strategy for tennis researchers to use tracking data to start to understand shot patterns and player actions in the sport.
Despite the lack of tennis papers at the 2017 MIT Sloan Conference there was plenty of interest for tennis research. Advances in NBA analytics were especially strong and, I hope, I’ve highlighted some areas— from understanding shot effectiveness to better characterizing player movement— where cross-pollination of data science methods could be particularly fruitful for tennis.