Introducing Return Impact Maps

Which of the top ATP players are the most aggressive on the serve return? Which are the most defensive? Is Andrey Rublev’s serve return style closer to Roger Federer’s or Novak Djokovic’s? In this post, I introduce an interactive visualization tool to help answer these and many other questions about the return impact of top men’s tennis players.

One of the things that is woefully missing from mainstream tennis is compelling graphical analysis. Tracking systems like Hawkeye have been gathering rich data at pro events for 15 years. Yet the most advanced display of that information hasn’t gone beyond a basic dot chart showing where balls landed or were hit from. This kind of chart is showing us information. But it isn’t really helping us to draw interesting conclusions, like what patterns a player might be showing? And whether these patterns are unusual in some way? For these kinds of questions, we need both an analysis to extract key summaries from what is observed and a good approach for visualizing those summaries.

During the off-season, I thought it was time to stop complaining about the state of visualizations of tracking data in pro tennis and try to walk the walk. After weeks of modeling, working through ideas with stats maestro Jim Albert, and wrangling with d3, I am pleased to present the first iteration of a ‘Return Impact Map’. This is an interactive heat map of the first return impact patterns of 84 of the top ATP players in today’s game. The position shown is the expected surface of return impact against an average server for the specified return type and surface. Importantly, the chart is looking at a player’s average tendencies over several years of ATP match data for each and adjusts for each server the player has faced, which can help us separate a receiver’s style from how receiver’s usually adjust to specific servers. More details of how this is done are provided at the end of the post.

You can use the drop drown menus at the top of the chart to change the player, serve type, and surface. Toggling in between players is a great way to contrast the variation among impact styles. You probably would have guessed Nick Kyrgios and Roger Federer have some of the most aggressive return impact positions in the sport, but did you know Alex De Minaur is just as aggressive? Or that Daniil Medvedev is more defensive in his impact position than Rafael Nadal?

These are just a few takeaways this kind of tool helps us to easily find. I will summarize some of the interesting findings I’ve uncovered in working with these data in a future post. I also have summaries of second return impact for the same players and will create a separate map for those results. It turns out the second return impact is more complex in significant ways from first return patterns and needed some modifications to the first return model. But more on that to come…

Model of Return Impact

The model of first return impact can be thought of as a bivariate plus-minus model with both receiver and server offsets to the population average. To see what I mean, let $\mathbf{Y}_{srn}$ be the $n$th return impact point of receiver $r$ versus server $s$. The return impact is a coordinate that has the depth and width of a player’s position for every returned point in the court’s coordinate system. The key part of the model is how we set the expected value for this position, which is as follows

$$\mu_{srn} = (\boldsymbol{\alpha} + \boldsymbol{\alpha}_r - \boldsymbol{\alpha}_s)' \mathbf{X}_{srn}$$

The vector $\mathbf{X}_{srn}$ is where we capture the context of any particular impact. This includes indicators for the serve direction, surface and court side. The parameters $\boldsymbol{\alpha}$ are the effects for each dimension of position (depth and width) for the average receiver. The effects $\boldsymbol{\alpha}_r$ are the offsets for the receiver $r$. So, for example, a player that impacts deeper than the average in most situations would have a positive term for their $\boldsymbol{\alpha}_r$ intercept for the depth dimension. The same idea applies to the server effects $\boldsymbol{\alpha}_s$, though we subtract the offset of the server effects.

Once we have the mean specified, each specific outcome is regarded as a draw from a bivariate normal with that mean and variance-covariance $\Sigma$,

$$\mathbf{Y}_{srn} \sim MVN(\boldsymbol{\mu}_{srn}, \boldsymbol{\Sigma})$$

I use a Bayesian implementation of this model, where I can specify shared priors for the server and receiver effects. This has the advantage of partial pooling across the player effects, where we balance between global and individual effects in a way that accounts for the differences in sample size across players. The model was fit with pymc3 on a dataset of 1,287 ATP matches between 2018 and 2020, which included 74,883 first return points. Huge thanks to Brandon Willard on the JAX backend in pymc3, which greatly accelerated the fit of the model.