Cape Town celebrates R and tennis data science at satRday

While Roger Federer earned his 17th straight win of 2018, tennis data science was taking a center stage at Cape Town’s satRday conference.

Opportunities to dig into the ‘mechanics’ of being a data scientist in the tennis industry are rare. I was lucky to have a chance to do just that this saturday thanks to Andrew Collier (@DataWookie) and his colleagues who organized the 2018 satRday Cape Town conference.

In it’s second run in Cape Town, the 2018 satRday included a one-day line-up of R developers and enthusiasts talking about exciting ways they are using and extending the language. Of the 23 speakers, 20 are currently working in South Africa, and it was exciting to see the ways they are applying and contributing to the language.

As one of the speakers, I was able to share how I am using R for real-time tennis analytics at the Game Insight Group. There was also a workshop prior to the conference where I got my hands dirty with participants interested in tools for sports data science, showing them some of the ways they can collect, wrangle, and model sports data more effectively in R. All of the workshop and keynote materials are available on github.

Although the conference was open to any R topic, it was exciting to see a number of presentations that were highly relevant to sport. Neil Watson (@rugbystatsguy), a Lecturer at the University of Cape Town, showed us how he is using R to analyze and visualize momentum in Rugby Union.

Caption: Neil Watson's momentum heatmaps

Sean Soutar, a student at the University of Cape Town, showed how we can use docker and RSelenium to scrape dynamic data from the Web, a set of tools I use quite frequently for gathering tennis data.

Robert Bennetto extolled the benefits of sp for working with spatial data, which could be an especially useful resource for deriving spatial metrics in sport. Peter Kamerman’s entertaining talk on purrr convinced me that I should be using map and pmap more for my attempts at functional programming in R. I was also very appreciative to David Lubinsky, who closed the talks of the day, for giving such a helpful tour of profvis, Winston Chang’s tool for profiling code in R, that will be a huge help for identifying bottlenecks in code going forward.

Caption: Peter Kamerman's purrr examples

In addition to these take aways for stats computing in sport, I was thrilled to see the amazing work R-ladies are doing for the R community. Two deserve particular praise. One is Wiebke Toussaint, who, with truly boundless energy, shared with us how she is using ckanR to make data in energy research available to the world through an open data portal.

Caption: Maëlle!

But it was Maëlle Salmon who gave the kick-off keynote of the day a set such an engaging and exciting tone for the program. Maëlle is a Research Software Engineer at rOpenSci, where she is working with a team dedicated to making package development in R a more collaborative, fast, and joyful experience. In her talk, she showed how rOpenSci is helping developers make better software, and she sprinkled a lot of magick on top along the way. It was truly the highlight of the event.