If you're like me, you grew up scanning the sports pages and studying rows and rows of box scores. The box score had some pretty simple numbers in it. For every player in the game, you knew how many minutes he played, how many field goals he attempted and made, three pointers attempted and made, rebounds (usually offensive and defensive), assists, turnovers, steals, number of fouls, and total points... along with some other secondary stuff (refs, attendance, etc.). A box score is a simple number chart, a slug of data representing a game played. I am an ignoramus and I love box scores.
For years, this was enough. Then came Bill James, then fantasy sports-leagues and stats became a game inside the game. And now it's 2011 and it seems the box score is an anachronism, a thing from the past. Oh, the box scores numbers aren’t totally useless, the data they contain are the basic building blocks of advanced stats. But, unlike the eleven or twelve simple numbers in the box score, advanced stats are seemingly infinite. They range from a fairly straightforward number like plus/minus to various weighted scores like Player Efficiency Rating (PER) and Wins Produced (WP)... numbers that attempt to assign a single overall rating to every player. These "one stat" ratings tend to evaluate a player's "efficiency" through complicated formulas, but while the result is a simple single number, the cost of calculating that number is very complex and makes my head hurt. Actually my head hurts a lot right now, so... I'm going to turn to someone who knows far more about this stuff than I (the ignoramus):
Steve Perrin started Clips Nation in 2006. He's not only the foremost expert on the Clippers, he's also a smart guy... and knows a lot about this stat stuff. He's got some opinions and he's agreed to help me with what some of these numbers actually mean and how they work. Maybe, together, we can figure out which stats are valuable and which are not.
The Ignoramus: Welcome Steve, happy to have you here.
Steve Perrin: Thank you, I'm happy to be here.
TI: You know Steve, I'm ignorant about advanced stats. You think you can help me?
SP: Well, really it depends on how ignorant you are. I'm far from an expert on every advanced stat, but I probably know more than the average bear... or in this case, than an ignoramus.
TI: Well, okay, but let's try shall we? In the last year or two NBA.com added the plus/minus (+/-) to their box scores. What can you tell my about plus/minus?
SP: Sure. Plus/Minus is the points a player's team scores less the points the opposing team scores while the player is on the floor. It's really the simplest of the "advanced stats". Plus/minus was borrowed from hockey stats.
TI: In your opinion is plus/minus a valuable stat?
SP: Not very. There's a ton of noise (external influences) in plus/minus. Straight plus/minus is of course more dependent on team quality than any other factor - players that play on winning teams have strong plus/minus numbers... so Derek Fisher has a good plus/minus,
TI: Derek Fisher!? That guy couldn't defend a rain barrel!
SP: Exactly. But the Lakers as a team tend to outscore their opponents, so Fisher's plus/minus reaps the reward. Beyond that obvious problem, there's the impact of the specific players on the floor at the time; basketball involves 10 players on the floor, and one player's plus/minus is therefore affected by the quality of the other 9 players, both teammates and opponents. Starters tend to play against starters, reserves tend to play against reserves, and plus/minus can feel like comparing apples to oranges consequently.
TI: Consequently yes. I'm with you, my brother.
SP: Still, in an individual box score, plus/minus can be illustrative. If every starter has a huge positive plus/minus and every reserve is negative, then you know that the first unit played well and the second unit did not. Or if, in a very close game, one player has a large positive plus/minus in relatively limited minutes, then you know that player had a real impact on the game. I never use plus/minus as an indication of overall player quality across a season, but it can be interesting on a game-by-game basis.
TI: Okay, okay, this is good. So, plus/minus is useful but no more useful than another simple stat.
SP: Well now, stat geeks have tried to address the larger issues with plus/minus in a couple of ways. The first is relatively straightforward: they apply the "On/Off Value" (available at 82games.com). The on/off compares how a team does while a player is on the floor versus how they do when he is off the floor. That normalizes a bit for the team quality problem - even good teams should in principle see a drop off when their best players rest.
TI: So that fixes the bug. On/Off perfects plus/minus. Am I right on that?
SP: Sadly, on/off is imperfect - there's still plenty of noise in those other 9 players, and garbage time can really skew the numbers for players who don't play a lot of minutes. More to the point in on/off, it's not just a function of the player himself but of his replacement. Blake Griffin's on/off probably gets stronger when Craig Smith is hurt and Brian Cook is backing him up - but it has nothing to do with Griffin.
TI: Right, because Smith is better than Cook, when Smith replaces Griffin the team is better than when Cook replaces Griffin. So Blake's number goes up. I almost understand.
SP: The most interesting variant on the plus/minus is something called the Adjusted Plus/Minus (APM). Basically, APM tries to normalize for all those 9 other dudes on the floor. It gives every player a relative value, then it looks at how player x impacted the game given the fixed values of the other 9 players on the floor at the time. If all of player x's teammates are rated as stiffs, while all the opponents are rated as superstars, and yet player x's team outscores their opponent while he's on the floor, then x's APM will be through the roof. The math involved in this process gets pretty nasty and my degree is far enough in the rear view mirror that I don't completely understand it (though I'm pretty sure that Bayes Theorem figures into this somewhere). As you might expect, APM gets much better with larger sample sizes (i.e. the more data it has to work with) as the algorithms have to recursively set the starting values until they find the best fit.
TI: Bayes theorem? Thomas Bayes right? He's the dude who linked a conditional probability to its inverse: P(A\B) is related to P(B\A). I mean what could be simpler?
SP: Here's my favorite thing about Bayes: he was an 18th century vicar in the English countryside, and at the time that he proposed it, his theorem was little more than an intellectual curiosity. It takes large amounts of computer processing power to make practical use of Bayes Theorem, processing power that didn't exist until centuries after his death.
TI: Gee, I didn't even know they had basketball back then. Did you know that a guy named "Kafka" is playing for the Philadelphia Eagles (they're a football team). I don't think it's the same guy but I understand roaches live a really long time, you follow?
SP: Where was I? Oh yeah, APM. I'm very intrigued with the potential of APM. Basketball is a two way game. The goal is to outscore your opponent, but you do that either by playing offense so well that you score a lot of points or by playing defense so well that you give up very few points, or more likely by some combination of those two things. But defense only shows up in the box score sporadically, so a player like Shane Battier is undervalued by metrics that depend on computing a weighted score from his box score statistics. APM looks at the holistic picture, and therefore has the potential to accurately reflect the value of players like Battier. One guy who stands out in APM but not in any other statistical measure is Nick Collison of the Thunder; according to APM, OKC is simply a better team with him on the floor. (It probably didn't hurt that he was replacing Jeff Green most of the last several seasons.)
TI: So, is APM the Holy Grail?
SP: Maybe. Maybe. It's the approach that has the fewest logical flaws in my estimation - but the practical challenges are immense. People much MUCH smarter than me are working hard to try to perfect it, and there's little doubt in my mind that it is getting better and better. One problem with APM is that it is so mathematically complex that it will probably never be widely accepted because it will never be widely understood. Not to mention that there are different ways to approach the actual calculation, so there are competing versions of the same thing. But in various front offices around the league, versions of APM are being used to identify players that are undervalued by other metrics... and it won't be long before one of those front offices gets it right, if they haven't already.
Well... that's pretty good for a start, I'm going to spend the next day or two reading over this post and then maybe we can continue our discussion.
In the next episode of "Advanced Stats for an Ignoramus" we'll be discussing the new and different ways of looking at shooting efficiency beyond field goal percentage. I promise it will involve fewer theorems.