A Search for Surprisingly Sparse Head-to-Heads

March 14, 2021

Small sample size is typical of head-to-heads in pro tennis. Both seeding and knockout tournament designs mean that many pro players have played each other no more than a handful of times or sometimes never at all. Still, I find myself frequently surprised when I come across sparse head-to-heads between some seasoned players. It got me thinking if that reaction is even reasonable and how you might quantify how much some matchups are overdue?

If you have ever tried to predict outcomes in tennis, you’ve had to think about the ‘head-to-head’. It is something that commentators also get worked up about, and— Did you factor in their head-to-head?— is likely the first thing they will ask about any model to predict wins in tennis.

Embed from Getty Images

The amount of attention on head-to-head has always seemed to me to be out of proportion to the actual information most head-to-heads contain. You look at players rated 1800 or higher at any point in time and the head-to-head is 0.6 on average. So you pick out two top players at random and they are quite likely to have never played each other before.

I think people look at the 50+ head-to-heads among the Big 3 and somehow think this is representative of the sport, when it’s exactly the opposite. I was reminded of that this week when Roger Federer made his return to tour competition in Doha after a 13-month hiatus. With a narrow win over Dan Evans (a H2H = 3 before that match), Federer advanced to play Nikoloz Basilashvili who he had only played once before. With 32 years of pro-level play between them, this seemed to me an unusually sparse history.

But is it? How can we even say what a typical head-to-head should be?

That got me thinking about what factors might be predictive of the length of a head-to-head. Players who can reach later rounds of a tournament make a meeting more likely, so that would suggest the overall skill of two players to be a key factor. And having more years on tour with more events played increases the opportunities for adding to the tally of any possible head-to-head.

Do skill and match age explain head-to-head counts? I gathered the year-end cumulative head-to-heads of all male players with ratings of 1800 or more (a reasonable definition of ‘top’ players) from 2003 to the present. Using the combined ratings as a measure of total skill and the combined career matches played as ‘pro age’, the chart below summarizes what I found.

Figure 1: Head-to-head by competitor combined ratings and career-to-date matches played, 2003 - Present.

The plot reminds me a bit of a high-jump ramp, with head-to-heads of 10+ being outliers among the large mass of low counts. But when those more frequent matchups occur, they do appear to be more common among the more experienced and more highly-rated players. The chance of zero or one head-to-head is quite likely for everyone but still less so for stronger players with more years on tour.

Given the non-linear nature of the relationships in Figure 1, I used a GAM model for the head-to-head count with the log of combined skill and log of combined career matches as a bivariate smooth. Letting $Y$ be the cumulative head-to-head for two players in a given season, the expected count is

$$ log(E[Y]) = \mu + f(\mbox{log_skill}, \mbox{log_matches};\theta) $$

and $Y \sim Poisson(E[Y])$. The $f(.)$ is a smooth that allows us to capture non-linear patterns that might best describe the relationship between skill or match age and head-to-head counts.

The results from that model are summarized in the heatmap below. Here we see the expected head-to-head count by the total skill and total career matches of the competitors. What should immediately standout is the large swathe of player types expected to have 1 or fewer meetings. Even a head-to-head of 5 would be too small to say much about matchup effects, yet you would need players with 1600 matches or more played and 4000+ combined rating (both the upper 25th percentile of these attributes among possible matchups) to even expect to get that much history between players.

Figure 2: Expected head-to-head for men's singles according to their combined skill and career matches played.

When they met in Doha, Federer and Basilashvili has a combined rating of +4400 and career tour-level matches played just over 1,700, which would suggest an expected head-to-head closer to 6 than 1.

With a model for expected head-to-head, it is easy to then look for those head-to-heads that are actually well below expectation. I used the model to find the most surprising 0 counts among head-to-heads. The following table are 10 of the most surprisingly sparse. We see the Big 3 among all of these, as the long experience and high-rating of these players means they should have played just about every good player out there at this point. Yet we’ve so far been denied clashes between Andrey Rublev and Djokovic, as well as Dustin Brown and Roger Federer. Interestingly, Ricardas Berankis has escaped that challenge twofold, having never met Roger Federer or Rafa Nadal so far in his career. A pro player of his talent and experience would have been expected to have played them each 3 to 4 times by now.

Table 1. Current ten most surprising n=0 head-to-heads.
Player	Opponent	Expected H2H
Novak Djokovic	Pablo Cuevas	4.4
Ricardas Berankis	Roger Federer	4.2
Rafael Nadal	Ricardas Berankis	3.9
Roger Federer	Pablo Andujar	3.9
Novak Djokovic	Andrey Rublev	3.8
Dustin Brown	Roger Federer	3.8
Martin Klizan	Roger Federer	3.7
Jiri Vesely	Roger Federer	3.6
Tobias Kamke	Novak Djokovic	3.4
Novak Djokovic	Radu Albot	3.2

Even a pretty simple model for head-to-head counts raises some interesting conclusions. First, adjusting for head-to-head is actually a pretty challenging statistical problem in tennis, given that most head-to-heads are a sample size of 1 or less. The paucity of long head-to-heads also suggests that the belief in style clashes or players who “don’t match up well” against each other is either overblown or based on some gut feeling that can’t be easily tested.

Finally, the small-sample rivalries among tennis players makes you wonder whether the sport, in using knockout designs as their main event structure, might be missing the boat in some sense. If rivalries drive a lot of the interest in individual sports, shouldn’t the sport look for ways to encourage deeper head-to-heads? Weighting draw selection by the inverse of head-to-heads could be the least radical option. Some bolder choices could involve using more round robin stages with shorter match formats to allow more meetings while keeping the overall time on court in a season to its current level. If designed well, such a structure could not only add to head-to-head counts but even improve the chance that better players advance out of the group stage and into later rounds. I’m not sure the tour finals is the right model here, even end-of-season effects aside, but some variants on that theme could be worth considering.