Way back in Spring Training Economist author Dan Rosenheck gave a presentation at the MIT Sloan Conference that shed considerable insight on how to improve projections using small amounts of data to supplement the things that we think we already know. His beef was that while projection systems are pretty good, as is, they do have weak spots when it comes to young players and those who should be expected to have more variance in their projections. He showed that even something as generally thought of as meaningless like Spring Training stats could be incorporated into these projections to improve the results. Here are his slides for those that are curious. People love to talk about how a player is bound to regress or that that guy has outplayed his projections so far. Well, this will sort through HOW MUCH a player has diverged from their projections and the workbook creates a new level of expectation that can be thought of as closer to the truth than either just Zips or just 2015 alone.
Spring training is around a month of mostly sloppy data, but what happens when we use something that is more meaningful like the first two months of the season? Using his method of melding pre-season Zips with 2015 actual data I think we end up with something that is even better.
Updated Zips 6-9 –> Excel Workbook (recommended for formatting)
Here’s a Google Doc for those that prefer that–> Google Doc
I would highly recommend downloading that as it will be easier to follow along from here on out.
The things that Mr. Rosenheck found important were Strikeouts per At Bat, Walks per Plate Appearance, Batting Average on Contact, Isolated Power on Contact, and Stolen Base Attempts per Opportunity. Using his methods we can calculate where Zips thought the player would be on the year, and also, what the player has actually done and we can weight these by the expected ABs or PAs coming into the year with the actual number accrued for every player. If you flip over to the “Summary” tab you will see this very stuff. Here’s a look at the Rays K/AB:
The Growth calculation shows the difference between the new expectation and the Zips one so we see that Logan Forsythe has improved his expectations by around 18% from the beginning of the year. Most guys on the team have improved their strikeout rate with the exception of Rivera (barely), Beckham, Souza, Cabrera, and Wilson. We can do this same thing for the other categories. Here’s BB/PA:
Here we see Souza making an impression has he has drastically increased his expected walk rate. We see Forsythe again on here showing improvement over a huge sample and it’s not like Zips saw him as a hacker coming into the season. We can keep going down the line and look at BACON next:
Ignore Jaso and his lack of even a single ball in play (great trade!) and we see that Butler and DeJesus are getting more hits on balls in play that Zips expected. Rene Rivera is the laggard which I don’t think is all that surprising to those that have watched him play. One thing you may have noticed up to this point is that the guys on the tails are those that have very little track record. This is exactly where the projection systems are going to miss the most often. Contrast that with a guy like Longoria that has a very long trail of performance and you can see that he’s mostly pretty close to what Zips thought with some smaller tweaks. This is my biggest contention for why projection systems may be an upgrade over pulling names out of a hat, but are still ripe for improvement. Teams that are filled with guys with international free agents, or rookies, or guys that are old are going to be even more noise than signal when compared to teams that are comprised of guys that should have less variance by being in the prime of their careers with long track records. Let’s move on to ISOCON:
Here is where we see some huge diversion from the projections with most of the biggest misses on guys that had virtually no history in the game. Zips is going to just call them league average and heavily regress to that point, but we can see where players really stand out like Beckham and Butler and Souza. On the other end we see the catchers and Jennings and the 29 year olds Longoria and Cabrera. Lastly, we’ll cover stolen bases per opportunity:
You should expect to see wild divergence here since these are going to be the smallest sample sizes for each player when we’re looking at times on first or second instead of balls in play or trips to the plate. All Star Joey Butler shows up again with more aggressiveness than Zips expected. And so on.
Moving on to the “Growth” tab I have listed each players growth rate for new projection vs. the original Zips pre-season version so that we can get an idea of players that have risen or fallen in the various metrics. I’ve taken it a step further by creating z-scores for each category and then totaling those and taking into account the harmonized mean of plate appearances between what Zips thought and how many the player has actually accrued. This should help distinguish between small sample wildness and players that have shown actual growth or loss to their games. Here’s the Rays:
I have sorted here by the zTotal column to show players that have most outplayed and least played up to their projections. Frosty has really outplayed projections, and it looks like stuff that can continue going forward because of the across the board improvement. He’s striking out a lot less while walking more and not only making more contact, but more powerful contact. Joey Butler is in a similar boat on the batted ball side, but at some point he is going to have to demonstrate the ability to take a walk or pitchers will feast on him. The larger point here is that all of the guys that have grossly outperformed projections are guys with very small track records. It would be foolish to continue to solely use Zips projections to show where the team, as a whole, is headed when these updated versions do a much better job of showing what to expect.
The Rays have many more players doing better than expected and while they will regress some they’re not going to fall quite as far as most folks probably think. When looking at projections that say the Rays are an 81-win team the rest of the way or whatever it’s important to recognize that the foundation for those projections are using old data that doesn’t incorporate new information. I’m only looking at the hitting-side of things so I’ll leave the pitching-side of things and the conversion to wins to smarter folks with more time on their hands, but I do want to show the Team Growth rates (tab Team G) to show which teams are exceeding expectations and which ones are not living up to their projections:
We see that even though the Rays offense has struggled to score runs they have actually improved their projections on offense the 5th most in baseball. Keep in mind that this is incorporating both pre-season Zips and the 2015 data to compare our new level of projection with the old so this should be seen as real improvement and the raising of a baseline not just flukey luck that is bound to regress. This IS the regression. The Rays offense has exceeded expectations and I would expect that to continue throughout the year for the players listed here.
A team like the Red Sox has underplayed where Zips thought they would be. This is a reflection of both over-expectations on the part of Zips, but also new expectations that the team will not be as good offensively as thought going forward. This is one example, but it would be foolish to use pre-season Zips as a reasoning for why you think the Red Sox offense will improve. That may come to pass, but this is a team, as currently constructed, that you should not expect to hit for average or power nearly as well as was thought coming into the season. From time to time expectations must be adjusted to account for the most recent data which should be carrying the more weight, pound for pound.
The last tab of the workbook titled, “New Line” uses these statistics to derive a new expected slash line, wOBA, and a very buggy wRC+ projection that doesn’t include league differences. You’re probably right when you say that Joey Butler will not go on to hit .330/.362/.520 the rest of the way, but you’d be just as right to argue that he’s also not going to put up his Zips-projected line of .225/.306/.328 that was based on a dozen plate appearances and heavy regression (as well as the secret sauce that Dan Szymborski ladles oh so well). We can have new expectation that takes both of these things into account that sees him putting up a line of .260/.311/.392 the rest of the way. I think most Rays fans would find that acceptable from a guy that had never even heard of coming into the season.
Butler is but one example for why we need to be taking recent performance into account when projection how a player or a team will play going forward. When I update this in a month we’ll have an even better estimate of how the player/team will do over the rest of the season and so on. Those that carry a big microphone in this industry would do well to take some of these things into account going forward if they’d like to be taken seriously, because we know more today than we knew yesterday and tomorrow will be more of the same. Use that information! Don’t stare it in the face and deny it’s existence.