Predicting, Predicting, Predicting

I suppose I should rise to Ballwonk's challenge:

  • Nationals season record: 65-97
  • Nationals NL East division place and games back (if any): Fifth, 28 games back
  • Date on which Nick Johnson first appears in a Nationals game: July 26
  • Date on which Nick Johnson suffers season-ending injury: He doesn't
  • Nationals team leader in pitching starts, with number of starts: John Patterson, 26
  • Total number of starting pitchers used: Eleven
  • Number of ejections for Manny Acta: Two
  • Guzman's batting average, on-base percentage, and slugging average: .482/.661/1.159 . . . on second thought, .266/.303/.369
  • Nationals home runs at RFK: Sixty-eight
  • Paid attendance for the July 21 game against Colorado at RFK (our only Fox national broadcast of 2007): 24,657

Maybe I should have thought about those for more than thirty-five seconds. Nah . . .

* * * *

When I was in college, I had a friend who was sort of familiar with what I suppose one could broadly call "sabermetrics." He agreed a high on-base percentage was a boon to a hitter, and he generally acknowledged it was hard for a pitcher to get by without a competitive strikeout rate. If we had attended college ten or twelve years later, he would have probably absorbed a good bit of Baseball Prospectus, but this was when VORP was primordial glop, rather than an analytical tool that sounds like primordial glop. Recalling the comparatively few statheady sources that existed back in those halcyon days---during the reign of the great Roger Maynard, Usenet's preeminent troll---one would imagine my friend would have held some respect for Bill James, the voice of the stats. But he didn't, for one very simple reason.

"He used to give a prediction every year for Sport magazine," my friend explained, "and he was always wrong. Very badly wrong."

I cannot confirm James's prediction record for that publication, but I do recall him commenting that he disliked making preseason predictions and considered the exercise just short of futile. There exist many reasons why one would hold this distaste. It's a long season. Players tend to get injured. Teams tend to make trades. Prospects tend to sink or swim. Established players tend to blow up or fold the tent. And so forth.

Predicting even one five-team division is a difficult chore because there are so many factors to balance. Imagine assembling a tall bookshelf without screwing the pieces together in an orderly and deliberate fashion. You try to keep the thing balanced while one piece is sliding, another piece is bending, and yet another piece is about to break. In the National League East, for instance, how you predict the Nationals or Marlins directly affects how you predict the other three teams. Let's say your baseline for the Mets is 90 wins; disregard whether or not that is reasonable at the moment. But is that 90 wins based merely on New York's talent, or is it 90 wins after considering the talent of opponents such as Washington or Florida? If it's the latter, then good---but now you have to think how good Washington or Florida is in light of the talent fielded by the divisional opponents.

Theoretically, I suppose those considerations would serve to cancel themselves out---and they certainly sound circular---but in practice all of this seems in play and so much more. At a minimum, Washington and Florida do not comprise a single unit. The Nats might square up well against one divisional opponent, okay against another, poorly against yet another, and disasterously against the last one. The Marlins might do the same, or they might do the reverse, or they might match up equally well or poorly against two, or three, or all four. They could both turn out to be bad teams and yet affect the division in entirely different ways. In the midst of all this mayhem, the standings sort themselves out, and some of these dynamics are ridiculously unpredictable.

To provide a real-life gloss on this discussion, the A's defeated the Angels by four games in last season's American League West race. Without knowing anything more about the two teams, an observer could conclude an entirely different team, the Seattle Mariners, decided the divisional race:

Oakland 93-69   17-2    76-67
LAAAAAA 89-73   10-9    79-64

As the chart notes, the hilarious Svengali act the A's asserted over the M's decided the division. If you could somehow "Seattle-adjust" the standings, then the result is entirely different and not even all that close. Heck, if you merely give the A's a reasonable record against the M's, like 13-6, the division goes to a playoff. (Again, this statement is conditional on other factors not "shifting," like the performance of the respective contenders once it became clear the race was over.)

Ah, but that Oakland-Seattle thing was a straight-up fluke. How can you predict that? Well, I suppose you really can't. You can predict certain things with reasonable accuracy based on observation and intuition. Milo Hamilton rendering baseball on the radio a miserable experience sort of stands out. But some things don't lend themselves to easy prediction. Oakland taking 17 of 19 from Seattle can sort of be classified under the general "So that was Mallrats?" reaction.

* * * *

With the rise of the Stat Drunk Computer Nerd/BPro/BTF/THT/blog culture, I've noticed a certain semantical shift in recent years. Predictions are sort of passe; instead, we delve in projections. I'm not sure if the distinction makes a difference in the long run, but as a matter of connotation I've observed:

  • Prediction, n., wild-assed guess, often subject to the vagaries of fandom, foam fingers, pettiness, and split-second decision-making.
  • Projection, n., careful conclusion achieved through thorough analysis, subject to the whims of real-life as manifested in a specific season not squarely contemplated.
Perhaps I overstate, but that seems to be the long and short of it. These days, what you do is run every position and player through a projection system, then project team performance based on those projections. And, just to make sure you've filtered luck out of the process, you run those projections, say, one thousand times and report the average. You get results like those of the Hardball Times, which has all of two teams winning 90 or more games---you'll never guess which---and no teams losing more than 95 games. You see, that's a classic projection based on team quality, rather than a prediction of results in a specific year. In other words, it's reasoned, but it ain't wild-assed.

I'm going for wild-assery, just so you know. I'll let others hit on the squirrels and haikus.

AL East

1. New York (jerks always win)
2. *Boston (jerks always wish they always win, except for when they win, in which case you wish they hadn't won, until you remember what they're like when they don't win)
3. Toronto (move 'em to the AL Central and they finish . . . hmmn, that doesn't work anymore)
4. Baltimore (reinforced bullpen wins the day; avoiding last place = mission accomplished)
5. Tampa Bay (best last place in baseball except for two or three others)

AL Central

1. Chicago (PECOTA's got them winning like 18 games this season; I'll take the over on that)
2. Minnesota (that Ramon Ortiz sure can pitch!)
3. Cleveland (third, but 131-21 Pythag record nets an Adjusted Pennant, and adjusted flags adjust forever)
4. Detroit (the only way you start another decade-long losing string is to start now)
5. Kansas City (among Nats fans, "Gil Meche" has been eclipsed by "Stan Zoom" as stupidest expression)


1. Seattle (yes, I'm kidding; but I'm too lazy to delete it, so this is where I'll go I guess)
3. Oakland {finishes third, throws chair}
4. Texas (Buck's gone, which means they should win Mr. Universe; but consider this the year when Saberhagen had a bad odd-numbered year)

NL West

1. San Diego (Greg Maddux and David Wells combined are younger than my grandmother, but she throws harder than the former and could outrun the latter)
2. *Los Angeles (Ned Colletti is a great baseball man, by which I mean if the Dodgers are in trouble he'll trade for Mark Hendrickson's cloned sheep)
3. Arizona (thanks for Rizzo)
4. Colorado (insert humidor joke here)
5. San Francisco (nothing of any note happening here at any time this summer)

NL Central

1. Houston (Dan Wheeler takes over for Lidge, saves 96 games)
2. La Russa (Joe Buck taint still all residue-ey)
3. Chicago (I thought Soriano wanted to stay in DC!!!!)
4. Milwaukee (in NBA, they'd be like the seventh seed in the East)
5. Pittsburgh (Jim Tracy instills winning ways in Pittsburgh, leads Pirates to yet another losing season)
6. Cincinnati (look out for Homer Bailey-Mike DeJean trade)

NL East

1. Philadelphia (Charlie Manuel canned after slow start, Dallas Green leads Phils to division title, Hamels' arm to early grave)
2. Atlanta (Francoeur doesn't walk, blah blah blah)
3. New York (no Trachsel, no division title)
4. Marlins (sort of like Tampa, except no dome, which is actually in St. Pete anyway)
5. Nationals (you know who I miss - the Turkey Hill lady; Shoo, fly, shoo!)

World Series: Yankees over Dodgers.