Back to Article | Hockey Prospectus Home
December 1, 2011
Howe and Why
by Robert Vollman
Borrowing an idea from baseball's Bill James (as we often do), our very first Hockey Prospectus article introduced historical comparables and similarity scores to hockey.
First introduced in his 1985 Baseball Abstract, Bill James created the concept of calculating the similarity between two player-seasons, which can be compounded over a career, in order to compare one player in history to another. In this particular case, he was studying who should be in the Hall of Fame and who shouldn't, an approach Iain Fyffe brought to hockey with the Inductinator not long ago, but eventually its usefulness in predicting a player's career path was established, most famously by the effort led by our own grandfathers Nate Silver and Clay Davenport in their famous PECOTA system.
Just as our cousins translated the idea into football's KUBIAK, basketball's SCHOENE, and of course into Tom Awad's hockey VUKOTA, it also forms the basis of the SNEPSTS projection system. More than just projecting the most likely future performance of individual players, this approach can also calculate the probability of a breakout, based on how often a player's historical counterpart did so.
Arbitrarily defining a breakout as an increase of 0.2 points per game, the SNEPSTS system went through every 2009-10 player and automatically identified their percentage of close historical comparables whose scoring jumped. The results ranged from those under 1% like Antoine Vermette, Ryan Smyth, Nik Antropov, and Tomas Fleischmann, all the way up to 36% for Milan Lucic and 38% for Derick Brassard (Lucic hit it, Brassard didn't).
While individual performances may vary from one end of the spectrum to another, players grouped together seldom miss. Sure enough, the results came out as projected.
We looked only at players with at least one historical comparable with a breakout, and at least 10 matches total, and divided them into four groups: those with at least a 20% historical chance of a breakout, those between 10% and 19.9%, those between 5% and 9.9%, and finally those with under 5%, and they all had roughly an expected number of breakouts.
Breakout Probability, 2010-11 Group Predicted Actual Group A (20%+) 23% 24% Group B (10-20%) 13% 14% Group C (5-10%) 7% 4% Group D (5%-) 3% 0%
Milan Lucic's SNEPSTS projection for 2010-11 was 0.19 goals per game and 0.29 assists per game, very similar to Todd Bertuzzi and Brad Boyes, for instance. However, Lucic's historical comparables jumped 0.2 points per game 36% of the time as compared with Todd Bertuzzi's historical comparables (10%) and Brad Boyes' (8%)an important detail!
With that in mind, we ran the same analysis on this year's players and can therefore present the following list, and say with some confidence that six of them will have a breakout season (as we've arbitrarily defined it). Among those with at least a 20% chance of a breakout, the group average is 26%, which works out to 5.7 of the 22 players. Though we don't know exactly which ones, six of them will increase their scoring by 0.2 points per game this season.
2011-12's potential breakouts: Jamie McBain, Tyler Bozak, Brandon Yip, T.J. Galiardi, Alexandre Picard, Ryan O'Reilly, Alexander Ovechkin, Josh Bailey, Jamie Langenbrunner, Wayne Simmonds, Tyler Ennis, David Booth, Kyle Okposo, Wojtek Wolski, Erik Johnson, Patrick Kane, Lee Stempniak, Evgeni Malkin, Brandon Sutter, Mathieu Perreault, Troy Brouwer, and Jannik Hansen.
Right now the likeliest candidates are definitely Tyler Bozak, Ryan O'Reilly, and Evgeni Malkin, joined perhaps by Troy Brouwer, Jannik Hansen, and a surprise.
There are obviously some unusual names on that list, no doubt a consequence of the unreliable results with small sample sizesJamie McBain had just 14 NHL games going into last season, for instance. The good news is that Neil Greenberg and I are working on SNEPSTS version 2.0, which will build on what we've learned in its three-year existence, hopefully taking yet another long step forward in accuracy, and also to incorporate both breakout and regression probabilities. Any comments or suggestions would be perfectly timed right now.
When you're projecting a player to score 0.48 points per game, as we did last year for Milan Lucic, Todd Bertuzzi, and Brad Boyes, it's very useful to know that the probability of a breakout is three to four times higher for Lucic than for Bertuzzi and Boyes. The advantage of any system based on Bill James' almost 30-year-old principle of searching history for comparable players, like SNEPSTS, is that it can give you that key piece of information.
On a closing note, check out Justin Kubatko's methodology over at the incomparable Hockey-Reference.com web site, which also incorporates Point Shares (their version of GVT)though the projection component is still being developed.
Robert Vollman is an author of Hockey Prospectus. You can contact Robert by clicking here or click here to see Robert's other articles.
3 comments have been left for this article.