On surface level, when we watch a baseball game, it is
simple. I wooden bat, a stitched ball, and leather gloves. But behind the basic
nature of the game in all 30 stadiums is at the very least six stereoscopic
cameras and a Doppler radar tracking everything happening on the field. While
this provides MLB and fans with great data and information through MLB Statcast, every team has a group in their front office of data
scientists in charge of finding the next competitive advantage in the sport, and
players are using it to transform their performance. Here is an example of what Statcast can track in a game and can be used in a broadcast for:
The MLB Statcast system in an incredible optics system. It
starts with the Trackman Doppler System that takes 27
different measurements for everything related to the ball at 20,000 frames per
second. These measurements include velocities, spin rates, and exit velocities.
The cameras track the movement of every player on the field, providing coordinates
of all the players up to 30 times per second. All of this information is housed
by each team and MLB to develop new KPIs for players and teams. Unlike other professional
sports, much of this is publicly available on baseballsavant.com. Where fans can
not only view all of this information, but can see leaderboards, view
personalized graphics, or download raw data and do their own analysis.
Statcast from 2015 to 2017 provided information on 2.1
million pitches, so what do teams do with this? It has grown immensely and
continues to grow. Almost every team has an analytics department in their front
office, their roles impact what players teams try to acquire and how much to
pay them, building statistical models to project player and team performance,
developing valuation for draft picks, predicting the probability of a pitcher
getting injured. These models have grown as Statcast data has provided information
on how launch angle on a batted ball predicts hit probability, or the spin rate
of a pitch predicts the probability of getting a strikeout.
Teams bring in people that have to know how the game works
and can do the analysis. For example, here is a job posting for an entry
level position in the Dodgers’ Analytics department and a snippet of the requirements
“Experience building and validating mathematical, statistical, and/or machine
learning models, preferably in Python or R.” These aren’t just skills from your
basic statistics class, and it takes a lot more than just being able to play
fantasy sports to get an entry level job with a team.
Where do they keep all of this data? Each team has their own
system and database where they house their data and information. Here is an
example from a recent job posting with the Houston Astros to be a Baseball
System Developer that I know Dr. Weisband will love as part of the essential
duties,” Collaborate with a cross-functional agile team on
designing, testing, implementing, and maintaining scalable software for Baseball
Operations.” Not only do teams have to store Statcast data, but they have to be
able to track every player in their own organization, every player in every
other organization, and amateur players in the United States to international
prospects in places like Japan, the Dominican Republic, and Cuba. Teams need to
build software to efficiently track and securely store all of this information.
Not only do these
jobs ask for very skilled and intelligent people, they get them. Two of the
most well-known (in the baseball analytics world) examples are Farhan Zaidi and
Sig Mejdal. Before being recently hired
by the San Francisco Giants to run their baseball operations department spent
time with the Dodgers and Athletics as well, Zaidi was an MIT graduate, got his
PhD from Cal Berkeley and worked at BCG. Megdal now with the Orioles and
formerly with the Astros, was an engineer at NASA and Lockheed Martin. Both of these two were extremely successful with the Dodgers and Astros respectively, and is why they received promotions with other teams. While
baseball is merely a child's game, the data and information has so much depth and breadth,
it takes rocket scientists to analyze it well.
While this blogpost provides information on the some of the technology
and data used, this truly is just the tip of the iceberg and the game clearly
goes beyond the simple tools used to play the game (although companies are designing
bat handles to maximize launch angle). If you’re more interested on how the
desire for data and analytics started, I would encourage you to read about Bill James or read Michael Lewis’s book Moneyball
to see how teams first starting using "sabermetrics". For more information on how Statcast information is changing the game of baseball, here is a great video. Great websites to visit and read more data driven information include
Fangraphs and Baseball Prospectus.
Questions:
1)
Did you know how much data and technology are used in baseball? Have you heard of Statcast? Have you heard the phrases like “spin
rate” or “launch angle” used in a broadcast while watching a baseball game?
2)
Have you heard of data used in other sports and
how does it relate to baseball?
3)
If you don’t like baseball or think it is boring
and slow, does this information make you more or less interested in the sport?
1) I did know there was a lot of analytics used in baseball, but I didn't realize exactly how much. I have heard all of those terms before, but I didn't know that teams placed as much emphasis as they do on them.
ReplyDelete2) I was listening to something yesterday actually about how in the NFL and NBA they place too much emphasis on statistics. The broadcaster's point was that in those select sports, the game is more of a narrative and sometimes statistics are taken out of context when you don't apply it to a correct situation. I don't think this applies to baseball as much because it is more one-on-one than a team against another team so there are less moving parts at once.
3) I generally like baseball, but I think it can be slow at times. The statistics actually make it more interesting to understand and compare players during the downtimes.
It is interesting how analytics and statistics are different in the NBA and the NFL compared to MLB. Since the pitcher delivering a pitch essentially puts the ball in play, it is a lot easier to collect and measure data. While football has a lot of stoppage, what becomes important is how people move. Basketball is constantly moving, and it is a lot harder to create single data points, so far much of the basketball analytics movement has come in scoring and the increase in 3 point shots. Player tracking data has grown and is implemented in both the NBA and NFL, but the information is not publicly available in real time as compared to baseball.
Delete