Raburn and Sample Sizes


As Bryan Craves noted yesterday, Tigers fans (and management) are justifiably concerned – despite

Ryan Raburn

‘s performances at the plate in spring training – about whether or not he’ll be able to hit in the colder months of the regular season.

Over his career, spanning parts of 6 major league seasons, Raburn has really only been a good hitter in August and September.  But… he has been such a good hitter in those two months that his career averages really don’t look too shabby (at least in the post-juice era).  With the way he has hit in spring training – and given the fact that he has hit passably in April but abominably in May – I’ve been a little puzzled as to why Raburn hasn’t hit early and whether what we’re seeing is bad luck or something more.

For one thing, we don’t have an enormous sample size when it comes to Ryan Raburn at the plate.  Though he has been around for a while, he has usually been a part time player and has amassed only 1368 at bats over all that time.  That’s less than half the number for Delmon Young.  Moreover, Raburn hasn’t played as much in first-halves as he has in second-halves during those 6 years so the sample size for at bats in cold months isn’t that large.  Any time numbers look bad (or good) in a small sample, there’s a decent chance that you can chalk it all up to luck.  After all, if Gerald Laird has a tremendous month none of us will believe that he has turned the corner as a hitter.

What I decided to do was to test the statistical significance of Raburn’s differential batting averages in each and every month.  I won’t go into the details of exactly how “statistical significance” is determined, but in a nutshell what it means is this:  If we assume that there is some ‘true Raburn’ and that ‘true Raburn’ has an exactly 27% chance of getting a hit in any at-bat (like rolling a weighted die), there is some chance that over any number of at-bats the actual number of hits will deviate by some amount from that 27%.   We can look at the number of hits we get in (for example) April vs. the number of hits we would have expected from ‘true Raburn’ given the number of at-bats he has had in April and see how much deviation there is.  Statistically, it’s then possible to estimate the % chance that the Raburn we see in April is not the ‘true Raburn’ but some ‘inferior Raburn’ (given that he hasn’t hit particularly well in April).  Statisticians would typically set a fairly high bar for results that are ‘statistically significant’ as opposed to merely suggestive – let’s say a 90% chance that ‘April Raburn’ is worse than ‘true Raburn’.  A bigger deviation and/or a bigger sample would mean that we’re more likely to find that the difference between that month’s Raburn.

But… as I already mentioned, sample sizes for Raburn aren’t that huge and there is a lot of randomness inherent in baseball.  Great hitters have streaks and slumps just like the Raburns and Inges of the world.   The only month in which Raburn has been not only worse than the norm but worse in a statistically significant fashion is the month of May.  For May, and May alone, there is (statistically speaking) only a 2-3% chance that ‘May Raburn’ is the same as ‘true Raburn’.  For each and every other month, Raburn’s numbers are within the plausible range for ‘true Raburn’ to luck his way into.  For April there is only a 54% chance that he’s actually worse than ‘true Raburn’, for June only a 45% chance.  And while he has hit well in August and September, there is only a 64% and 65% chance (respectively) that those Raburns are actually better than ‘true Raburn’ and not just finally getting lucky.

Unfortunately, this doesn’t provide any real solace:  Raburn may be genuinely awful in spring and early summer – I certainly can’t say that he isn’t.  All I can say is that we don’t have statistical ‘proof’ that he is and won’t at least until he has had the opportunity to stink in those months for at least a few more years.