R: Language of Baseball Statistics

Over the last few days I finally managed to start reading — can anyone really read an academic text…it’s more like reading a few sentences and banging your head on a desk–one of my Christmas gifts: Analyzing Baseball Data with R. If you’re reading this, you’re already familiar with baseball; it’s a game that loves its numbers. Fortunately, R is a programming language that, too, loves its numbers.

By no means am I an intermediate user, much less an expert, but even some basic tinkering with R — and RStudio — has allowed quick generation of simple visuals that lead to questions/answers as we all approach draft — or free agency if you participate in dynasty leagues.

Using R and FanGraphs exports, it is quite simple to visualize simple categories. Perhaps in the future it will be quite simple do more than just that.

Once upon a time, I had no clue about BABIP. Now that I do, it is one of the first things I like to look at during the pre-season.

2015 Hitters with >400 PA; BABIP vs. AVG

As expected, the general trend is that BABIP and AVG correlate.
Bottom 5 BABIP: S. Drew, A. Pujols, C. Utley, B. McCann, L. Valbuena
Top 5 BABIP: O. Herrera, M. Cabrera, D.Gordon, P. Goldschmidt, K. Bryant

It’s pretty common knowledge that stolen bases are drying up relative to home runs. Simple distributions show the disparity in number of players who can contribute more than just a few.

2015 Stolen Bases

Home runs for comparison:
2015 Home Runs
And both:

HR SB 2015 400 PA

Only 4 players with 20/20 seasons last year (though J. Upton just missed with 19 SB):
Pollock (20 HR/ 39 SB)
Goldschmidt (33 HR / 21 SB)
Machado (35 HR / 20 SB)
Braun (25 HR / 24 SB)

Last for now is a basic presentation of ERA vs. xFIP including labels –which I finally learned how to input– for the 3 biggest under- and over-achievers.


2015 Pitching 100 IP ERA xFIP

Cursory review reveals:

  • Chris Young’s ERA typically outperforms his xFIP
  • Marco Estrada’s does not
  • Hector Santiago’s ERA typically outperforms his xFIP
  • Mat Latos’ ERA does not typically underperform xFIP
  • Ditto Rick Porcello
  • Ditto Michael Pineda

Quick thoughts (after all, it was indeed cursory):

Ignore Marco Estrada unless you are counting on the ~4 ERA. Young and Santiago are not much more than afterthoughts in most leagues. But in deeper(est?) leagues, they may be worth a cheap annual contract / late pick.

One can point to Porcello and Pindea’s more-than-career-aveage HR/FB% as a major reason why they underperformed. Nothing quite so immediate stands out on Mat Latos. He warrants future considerations — or may not considering that ever the deeper leagues may be Latos-intolerant.

Be the first to comment

Leave a Reply

Your email address will not be published.