Sunday, April 14, 2019

Data and Technology in Major League Baseball


On surface level, when we watch a baseball game, it is simple. I wooden bat, a stitched ball, and leather gloves. But behind the basic nature of the game in all 30 stadiums is at the very least six stereoscopic cameras and a Doppler radar tracking everything happening on the field. While this provides MLB and fans with great data and information through MLB Statcast, every team has a group in their front office of data scientists in charge of finding the next competitive advantage in the sport, and players are using it to transform their performance. Here is an example of what Statcast can track in a game and can be used in a broadcast for:


The MLB Statcast system in an incredible optics system. It starts with the Trackman Doppler System that takes 27 different measurements for everything related to the ball at 20,000 frames per second. These measurements include velocities, spin rates, and exit velocities. The cameras track the movement of every player on the field, providing coordinates of all the players up to 30 times per second. All of this information is housed by each team and MLB to develop new KPIs for players and teams. Unlike other professional sports, much of this is publicly available on baseballsavant.com. Where fans can not only view all of this information, but can see leaderboards, view personalized graphics, or download raw data and do their own analysis.

Statcast from 2015 to 2017 provided information on 2.1 million pitches, so what do teams do with this? It has grown immensely and continues to grow. Almost every team has an analytics department in their front office, their roles impact what players teams try to acquire and how much to pay them, building statistical models to project player and team performance, developing valuation for draft picks, predicting the probability of a pitcher getting injured. These models have grown as Statcast data has provided information on how launch angle on a batted ball predicts hit probability, or the spin rate of a pitch predicts the probability of getting a strikeout.

Teams bring in people that have to know how the game works and can do the analysis. For example, here is a job posting for an entry level position in the Dodgers’ Analytics department and a snippet of the requirements “Experience building and validating mathematical, statistical, and/or machine learning models, preferably in Python or R.” These aren’t just skills from your basic statistics class, and it takes a lot more than just being able to play fantasy sports to get an entry level job with a team.

Where do they keep all of this data? Each team has their own system and database where they house their data and information. Here is an example from a recent job posting with the Houston Astros to be a Baseball System Developer that I know Dr. Weisband will love as part of the essential duties,” Collaborate with a cross-functional agile team on designing, testing, implementing, and maintaining scalable software for Baseball Operations.” Not only do teams have to store Statcast data, but they have to be able to track every player in their own organization, every player in every other organization, and amateur players in the United States to international prospects in places like Japan, the Dominican Republic, and Cuba. Teams need to build software to efficiently track and securely store all of this information.

 Not only do these jobs ask for very skilled and intelligent people, they get them. Two of the most well-known (in the baseball analytics world) examples are Farhan Zaidi and Sig Mejdal.  Before being recently hired by the San Francisco Giants to run their baseball operations department spent time with the Dodgers and Athletics as well, Zaidi was an MIT graduate, got his PhD from Cal Berkeley and worked at BCG. Megdal now with the Orioles and formerly with the Astros, was an engineer at NASA and Lockheed Martin. Both of these two were extremely successful with the Dodgers and Astros respectively, and is why they received promotions with other teams. While baseball is merely a child's game, the data and information has so much depth and breadth, it takes rocket scientists to analyze it well. 

While this blogpost provides information on the some of the technology and data used, this truly is just the tip of the iceberg and the game clearly goes beyond the simple tools used to play the game (although companies are designing bat handles to maximize launch angle). If you’re more interested on how the desire for data and analytics started, I would encourage you to read about Bill James or read Michael Lewis’s book Moneyball to see how teams first starting using "sabermetrics". For more information on how Statcast information is changing the game of baseball, here is a great video. Great websites to visit and read more data driven information include Fangraphs and Baseball Prospectus.

Questions:

1)    Did you know how much data and technology are used in baseball? Have you heard of Statcast? Have you heard the phrases like “spin rate” or “launch angle” used in a broadcast while watching a baseball game?
2)    Have you heard of data used in other sports and how does it relate to baseball?
3)    If you don’t like baseball or think it is boring and slow, does this information make you more or less interested in the sport?

2 comments:

  1. 1) I did know there was a lot of analytics used in baseball, but I didn't realize exactly how much. I have heard all of those terms before, but I didn't know that teams placed as much emphasis as they do on them.

    2) I was listening to something yesterday actually about how in the NFL and NBA they place too much emphasis on statistics. The broadcaster's point was that in those select sports, the game is more of a narrative and sometimes statistics are taken out of context when you don't apply it to a correct situation. I don't think this applies to baseball as much because it is more one-on-one than a team against another team so there are less moving parts at once.

    3) I generally like baseball, but I think it can be slow at times. The statistics actually make it more interesting to understand and compare players during the downtimes.

    ReplyDelete
    Replies
    1. It is interesting how analytics and statistics are different in the NBA and the NFL compared to MLB. Since the pitcher delivering a pitch essentially puts the ball in play, it is a lot easier to collect and measure data. While football has a lot of stoppage, what becomes important is how people move. Basketball is constantly moving, and it is a lot harder to create single data points, so far much of the basketball analytics movement has come in scoring and the increase in 3 point shots. Player tracking data has grown and is implemented in both the NBA and NFL, but the information is not publicly available in real time as compared to baseball.

      Delete