The Future of Tennis Data - A Conversation with Edoardo Salvati

Earlier this week, FiveThirtyEight writer Oliver Roeder published an article entitled “Can An Astrophysicist Change The Way We Watch Sports?" The piece documents the efforts of Matt Ginsberg (the astrophysicist in question) to develop a system for real-time predictions of shot-making for the NBA and, at the same time, raises a number of broad questions about the future of advanced analytics in sports and sports broadcasting. One of the questions that most intrigued me is whether, as the title of the article suggests, any one person can “revolutionize” how we experience sports?

The idea of one individual single-handedly (or perhaps with the aid of a programming whiz of a son and the financial backing of a dealmaking tycoon) advancing the science of sport has its romantic appeal. But, when it comes to improving the quality and richness of information about sport, I suspect that crowds will make the real breakthroughs—just as they have with creating a free online encyclopedia, digitizing text, or translating the Web.

Consider tennis and the problem of stroke-by-stroke data (i.e. every forehand, slice, etc. and the sequence they were played). Despite a global audience tuning in to every professional tennis match these days, until a few years ago, there was no public data source with information about the strokes played in any professional tennis match. And if we think about how this information could be gathered and made available to the public in a routine way, one could imagine that some genius could one day teach a machine to read “overheads” and “winners” off of a video recording of a match and that said genius would have the magnanimity not to sell this machine to the highest bidder. Until that day comes, a collaborative effort will be needed.

The Match Charting Project (MCP) is the most successful effort of this kind. Conceptualized in 2012 and launched in 2013 by Jeff Sackmann, creator of Tennis Abstract and Heavy Topspin, the MCP enlists volunteers to become “citizen scorers” for every stroke in professional tennis matches of their choosing. Sackmann provides volunteers with an Excel spreadsheet to assist with charting in a standardized way and pushes new matches to a Github repo where anyone can access them.

A little more than one month ago, the MCP could boast 1,100 charted matches by 57 different contributors; today there are 1,300. When I did a brief analysis of the distribution of charted matches across contributors, I found a very skewed distribution with a handful of contributors accounting for 20+ matches and the large majority only a few. This made me curious about who the “super”-contributors are and what their motivations to chart and their charting processes tell us about the future of the MCP and similar efforts.

As luck would have it, I recently became acquainted with Edoardo (Edo) Salvati, an MCP super-contributor. Edo is a native of Rome, currently living in Milan, a mountaineer and an avid reader of The Economist. He is currently doing business development for a boutique law firm, while developing a blog on sports. Recently, Edo was kind enough to answer some questions about the story behind his involvement with the MCP. Below is a transcript of that conversation.


Q1. How did you get interested in tennis?

When I was growing up, I was looking for a sport that could provide a more relatable and less brutal experience than football (both by playing it and watching on TV), which is something of a religion in Italy. Tennis—with its fair play, calm and sportsmanship—was a natural antidote. My first vivid memory is Stefan Edberg’s Wimbledon semifinal loss to Michael Stich in 1991, when he lost the match without dropping serve. Also, my first taste of a statistical oddity in sport. I remember watching my first matches, like the 1992 US Open final, on a black and white TV my family owned that had terrible reception. When I was still enthralled despite the fluttering lines of grey caused by the bad signal, I knew I was hooked!

Q2. What is your current involvement with the sport? Are you a player, fan, commentator, or would you describe your relationship to the sport in some other way?

I am a sports addict. Really, I need a daily dose of almost any kind of sport, whether it be watching it on TV, reading online commentary, or playing. As for tennis, it is my favorite sport right now and I am a long-time fan. I am a good player at an amateur level, but back issues sometimes limit my ability to play. Recently, I have started writing about the game. A piece on the umpiring of Eva at this year’s US Open final was published in an online Italian sports magazine.

Q3. Which tours do you follow?

Primarily the men’s Grand Slams, ATP matches, and occasionally Davis Cup. I don’t normally watch women’s matches, unless it’s a really big match, as I don’t derive the same entertainment value from the women’s game as from the men’s game. But I stay up-to-date with results for the WTA.

Q4. How do you follow what is happening in the sport? In other words, which tennis media, commentators, or players are your main sources of info about the game?

I rely on two main sources: the Internet and TV. I read sports articles from English-speaking news outlets (ESPN, Grantland [when it was in operation], FiveThirtyEight, Heavytopspin, etc.) and from the tour and tournament websites. There are also a few tennis websites in Italian worth reading that I follow. In Italy, there are three TV channels regularly broadcasting tennis matches: Supertennis TV, which is owned by the Italian tennis federation and has rights to several ATP 500s, WTA Premier/International events and Davis Cup ties; Sky Sport TV, which has rights to Wimbledon and the ATP Masters tournaments; and Eurosport (a sort of European ESPN, where Mats Wilander commentates), which has rights to the other 3 Slams, some ATP 250s, and most of the remaining WTA International events.

Q5. You have expressed some criticism about some kinds of tennis writing (or perhaps views of fans?). Could you describe what you think are problems with the current state of tennis media?

Overall, the Italian sports media lacks a statistical approach to tennis analysis. When numbers are used, they are usually to back up an opinion rather than as part of a larger more objective analysis. There are a couple of websites where well-written Italian commentary articles can be found, but these still shy away from statistics. I think one reason for this is that fans have gotten to use to commentary that is more like a bar-room chitchat than an informed debate. It will take a serious cultural shift for commentators and fans for this to change.

Just to give an example, some days ago, the Financial Times published a one-to-one interview with Novak Djokovic. The most important financial newspaper in Italy had the essay translated for its sports section by one of their translators who had no clue about tennis. The final result was full of mistakes about the game, which made for a poor reading experience.

Q6. When and how did you first learn about Heavy Topspin/Tennis Abstract?

At the beginning of 2015, as I was reading through the major American sports media outlets, I read a tennis article by Carl Bialik of FiveThirtyEight that quoted Jeff Sackmann and his project/blog.

Q7. How did you end up getting involved with the Match Charting Project?

I congratulated Sackmann by email and asked if there was a way to determine a player’s record with specific umpires (say, Roger Federer’s match record with Pascal Maria). In our exchange he mentioned the MCP. I promised that I would have a look at the charting file for the project and try it out when I had the chance.

Q8. What motivated you to participate?

I was impressed by Sackmann’s work, which put attempts at analysis posted on the ATP’s site to shame. I also learned more about the motivation for the project from his talk at the Sloan conference, after reading about it on ESPN/Grantland. I admire the fact that it’s an independent project that already provides a quick and powerful search engine for most anything related to a player’s career or performance. As a tennis fan, I felt I had to contribute.

Q9. Have you ever scored before? Or been involved with another effort like this?

I have never scored before or been involved in a similar project. The MCP is one of a kind.

Q10. How did you learn to chart?

It has been a learning process. I downloaded the spreadsheet, carefully read the instruction worksheet and dove right in. My first match was the 2015 Monte Carlo Masters semifinal between Djokovic and Nadal. Obviously, there were a lot of shots and it took a very long to chart. Also, charting is more complicated for left-hand players I had to alternate between typing, pressing the pause button on the remote control and flipping through the instructions to be sure my coding was correct. Sometimes I had to do this multiple times for one point. There were times I would have to start over, especially in long rallies, because I had made a mistake or missed a shot. The process is the same today, but now I have the coding system memorized so I can chart much faster. It is like learning a language: the more you practice, the greater your fluency. It can be frustrating at first but it becomes more and more rewarding overtime.

It is like learning a language: the more you practice, the greater your fluency.

Q11. How do you decide which matches to chart?

Basically, it is a top-down approach, as it all depends on what matches I can record. Having a complete match recorded is not always possible since the TV guide that is linked to the schedule of recordings is often unreliable. If I have several recorded matches to choose from, I decide according to who’s playing, the length or importance of the match, and its overall appeal. I have a preference for charting men’s matches, Slam finals or Tour finals. But I can also chart a 250 R32 which, normally, is less challenging.

Q12. What is the experience of charting like? Do you do anything in particular to prepare? Do you use Jeff Sackmann’s spreadsheet, the Android app, or something else?

If you are a tennis fan or player (and being either one helps) charting makes you experience a match to a different level of understanding. Rallies become trajectories, the court is a landing spot divided in sectors, shots aren’t just the effect of Newton’s third law of motion, but are defined by where and how they are hit. You get to really appreciate the kinetic beauty of an exchange between two elite competitors. The spreadsheet becomes a living object, like the green, cascading code in “The Matrix” (but less ominous for humanity!). The code itself reminds me of the chess notation.

Charting requires time; no question about that. And, the more points played, the longer it takes. But then, when you’re done charting matches like the 2015 French Open final or the 2015 US Open final, you feel you have really accomplished something.

I don’t do anything in particular to prepare. I try to find a moment of the day when I know I won’t be interrupted. I check if the match I have recorded is complete. I open the spreadsheet on my laptop and type on the keyboard with my right hand, while I control the remote with my left.

Q13. Do you chart live matches or only recorded ones?

I only chart recorded matches. Live rallies are played too fast for me to input all the information required. I don’t know if any contributor is live-charting, except for Sackmann himself as I know he can chart live.

Q14. What happens after you have charted a match?

I send the spreadsheet to Sackmann and it gets uploaded to the MCP database, where it displays a wide range of detailed data about the match made available to anyone on the TennisAbstract website. He handles the data processing and end-user stats, which is one of the most valuable products of the project.

Q15. How many matches have you charted? How many have you attempted to chart?

At the time of writing, I have charted 76 matches. After having charted two matches that had already been charted, I now make sure to check the database first and “claim” a match on the MCP sheet that lists the matches that are being charted (to avoid duplicating the contributors work). I have never only attempted to chart a match. After having established myself among the top contributors by number of matches, I am now more interested in the quality of the matches I chart, rather than becoming the number 1 contributor (Lowell, currently with the most logged matches after Sackmann, has almost 100 matches more than I do). Also, I am quite happy with my contribution, considering that I started not too long ago. After the first match, I thought I’d never reach double digits.

Q16. If you have ever started charting and not finished, what was the reason why?

I recently started charting the 2009 Rome Masters semifinal between Federer and Djokovic but had to stop after a few games because the recording was corrupted and too many points were missing. I am not aware of the quality of the video until I am actually watching the match. But this was the first and hopefully last time that happens.

Q17. What is difficult about charting? Are there ways charting could be made easier?

Once you have familiarized with the code, the input is immediate. What I still find tricky is deciding which letter/number to assign to a specific shot. Some shots can land in a part of the court that is exactly in between the set of letters/numbers provided by the code. Apparently similar shots can have a different coding within the same point, because, on a closer look, the ball ends up going in a slightly different direction. Another major decision is the distinction between forced and unforced errors, which is even a problem for official scoring. Also, the classification of shots type, direction and length can be affected by factors like the TV camera angle or position, quality of the video, replays or commercial breaks, which sometimes can be annoying or disrupt the charting. However, the code is exhaustive for the purpose of the data collection and the spreadsheet is self-populating so that you can focus on charting. The “close calls” could be made easier by adding additional letters/numbers, but then the overall input process would become too burdensome and redundant. I think it is a well-balanced tool as it is.

Q18. What are some questions you hope the MCP can help to answer?

I am not a statistician nor a stats proselyte, but I can see a big role for stats in interpreting a match, understanding trends, providing a source for analysis, and helping to find ‘law and order’ in tennis. The MCP is unique because it is a shot-by-shot tracking software. These data make a wide range of analysis possible, as testified by the writings on Sackmann’s blog.