I've enjoyed the debate about Al Thornton in the comments from my recent post. The debate has not been just about Thornton, but by extension about John Hollinger's "system to evaluate the pro potential of college players." You need Insider to read that article, and because it is 'for pay' content, I won't re-print it here (I excerpt from it below). But Hollinger's methodology is interesting if nothing else.
I've said that I felt Hollinger was going with his gut in seemingly singling out Al Thornton as a bad pick, with lots of hyperbole like 'neon warning signs' and his 'bust-detector in overdrive'. It seemed to me that he wasn't using pure data. I stand corrected; corrected but dubious.
In fact, he is using nothing but data. He has reverse-engineered his numerical system to make sense of the actual results. He never actually stated that the system was reverse-engineered, but that's what he did. Why do I believe that? Well, look at his second most relevant factor, steals:
Clearly he's not using tangible factors that have proven worthwhile in his other work in giving steals 'the most weight' in his college formula. He describes steals as a 'worthless' stat in the NBA - and let's face it, NBA stats is his day job. So how do you explain his inclusion of steals so prominently in his college evaluator? I mean, after all, he's using a 'worhtless' stat more heavily than PER, his bread-and-butter, the stat that made him famous. Why? One reason only - it helped make the numbers work.
And age is another factor that helped make the numbers work. Since Al Thornton was the oldest player in this draft, in fact Hollinger's not only going with his gut. The results from prior drafts tell him that age is a red flag, in and of itself.
But what about those numbers? He never says it outright, but it would seem that Hollinger has devised his system based on the results of the last five drafts. In fairness, he may be using more. But he attempts to illustrate the efficacy of his system with examples from the past five years, and there are zero references in his article from prior to 2002. So I'm inferring that he used only that data. I could of course be wrong. But consider where he says that the 2007 draft had "the highest-rated player of the past six years." The best collegian in the 2001 draft was probably Shane Battier. Did Battier or some other college player in that draft score more highly than Kevin Durant (and Carmelo Anthony for that matter)? Or did he not go back that far? It seems pretty obvious the answer is the latter.
30 first rounders and a handful of second rounders from the last five drafts. That's roughly 200 players. But then you exclude high schoolers and foreign players - this system was built specifically to evaluate players with college experience. So, call it 150 data points - it's probably far fewer than that, but we'll go with 150. Now, I'm no statistician, but my brother is a PhD, and that is what he would call "a small 'N'" - you can come up with a system that works for those 150 data points, but how useful will it be for the next 20? Answer - probably not very useful.
Speaking of my brother, this paragraph would absolutely make him howl:
Really? The system you reverse-engineered from the actual data works for the data you used to reverse-engineer it? I should hope so! One really needs to see how it works for the 2007 draft - the first one where he wrote down his predictions (pre in this case being a prefix indicating in advance of the outcome) - before one starts crowing about how it works. For the prior drafts, it works by definition. It was built to work. Why would he go to the trouble of building a system that didn't work?
OK, enough picking nits. The job of statisticians in popular culture is not easy. If you give enough explanation to satisfy the PhD's you bore 99.99% of the audience to tears. But if you don't, those 0.01% will accuse you of having a "small 'N'" or say disparaging things about your standard deviation, or even your mother's standard deviation (those stat guys can talk some serious smack).
Let's return to the primary point I was trying to make in yesterday's post - Thornton's age to me is a plus, not a minus. By contrast, age (the younger the better) is one of seven factors Hollinger uses in his formula. He doesn't give weighting information, but he does list age first, so take that for what it's worth. Here's what he has to say:
That's why "veteran" rookies like [Melvin] Ely, [Dan] Dickau, Rafael Araujo and Francisco Garcia have underwhelmed at the NBA level, while the freshman stats of a player like Chris Bosh take on new meaning when you understand his youth.
Sky's the limit for Kevin Durant, who's only 18-years old.
This year's draft is especially alluring on the youth front, since the NBA's new age restrictions resulted in several coveted players going the one-and-done route -- most notably Kevin Durant and Greg Oden.
On the other hand, Florida State's Al Thornton is walking around with a giant neon warning sign, as he will turn 24 a little over a month into his rookie season. He's more than a year older than LeBron James, and nearly two years older than Darko Milicic.
OK, bear in mind that he's basing his system on the actual data from the last 5 drafts, and he's only including collegians. The best draft picks, the ones that tend to be surefire pros, are the guy's who play one year of college, right? Straight to pro high schoolers are spotty (and are excluded from his 'collegian' only numbers at any rate). But a guy who could have gone pro as a HS senior, who then has an outstanding freshman year in college (Carmelo Anthony, Chris Bosh, Chris Paul, etc.) is a pretty good bet. IT DOESN'T MEAN THEY WOULD NOT HAVE BEEN GOOD PROS IF THEY HAD STAYED IN SCHOOL AND BEEN DRAFTED AT A LATER AGE! These are the no-brainers, and since they tend to become pros at age 18 or 19, it makes the numbers skew in favor of drafting 18 and 19 year olds. (Again bear in mind that he's analyzing only collegians in this formula, which conveniently excludes straight to pro High Schooler busts like Robert Swift and young Euro busts like Yaroslav Korolev - pretty tricky.) It's a self-defining system. The best players, the real cream of the crop, aren't going to play more than two years at the college level - there's too much money waiting for them. But if they did for some reason (think Tim Duncan), it would be an automatic red flag in this system.
From my perspective, including high schoolers and younger foreign prospects, it seems to me that the older picks are the better risks, all other things being equal. But I readily admit that I have a much smaller N than he does for my argument.
On the other end of this argument are the 'older' picks whom he deems busts. He lists all of four from the last five drafts (talk about a small N) who were 'underwhelming', and frankly, I'm underwhelmed at his underwhelmation. Dan Dickau (drafted 30th for FSM's sake) is a bust? Is the last pick in the draft even expected to make it in the NBA? Francisco Garcia (drafted 23rd) is a bust? They sure as hell haven't given up on Garcia in Sacto yet. As for Araujo, well I think you'd have to take that one up with the guys in Toronto - no one in the world had him rated as a top 10 pick except for them. Which leaves Ely as the poster child for older draft pick busts. Big deal. Meanwhile, Luke Jackson was a top five prospect in his system.
But I keep coming back to this - if Al Thornton and other grey beards are only putting up good numbers in college because they are like "a grade-school kid who's been left back twice and dominates the lunchtime kickball game" then why doesn't the argument carry forward into the NBA? If the kids had no chance against Al Thornton, then why are they the best NBA prospects? It makes no logical sense - and for a very specific reason. It wasn't built on logic - it was built on reverse-engineered data. The 18 and 19 year olds were never better prospects because they were younger - they were drafted younger because they were better prospects. Hollinger got his cause and his effect mixed up. It happens.
I'll admit that with an older player, you have to wonder why they haven't gone pro earlier. But don't fall into the trap of making the false assumption that they are flawed simply because they are still in college. Look at the actual results, and in Thornton's case, those results are terrific. Why was he a 23 year old fifth year senior before he was rated a lottery pick? I don't know - maybe because he didn't concentrate on basketball until after his high school track and field career ended at 19 (the age Hollinger wanted him to go pro).
Time will tell about Al. I'm encouraged with his performance in pre-season, and I love the fact that he can jump over a 7 footer, but he hasn't played a single game that counts yet, so we're way ahead of ourselves if we're retiring his jersey. And Hollinger will revise his system (version 1.0 is always buggy) in future years with additional data points. I for one will save his predictions and see how well he does. If he's right, I will be the first to say so. But out of Nick Fazekas and Al Thornton, I know who I'd rather have on the Clippers.