This weekend, college football crowned two national champions despite an elaborate system constructed to ensure there was just one. This system did not include a playoff, much to the ire of many fans and journalists. Playoffs sure are nice: they give every team a fair chance and experimentally test a range of possible outcomes. The result is an unambiguous winner — the champion will have beat everyone who beat everyone. But life rarely gives you (or, at least, me) the chance to experimentally test one outcome against another. Life usually demands big decisions with too little information — just like the BCS. So, can carrying out a little thought experiment on how to reform the BCS reveal some things about how to build models in real-life situations?
Well, there would be no blog entry if I thought the answer to that question were “no.” So, supposing that this exercise is interesting, let’s engage in it.
First, let’s state the overarching problem: college football needs to designate one, and only one, national champion. College football in addition needs to do this without playoffs, and without destroying the existing bowl system, which consists of four major bowls, a number of secondary bowls and seemingly innumerable small bowls. These bowls have traditionally hosted matchups between two teams from set conferences.
The current system rotates the national championship game between #1 and #2 between the four major bowls. Who is #1 and #2 is determined by a complicated formula that takes into account two polls of humans, several computers, and the difficulty of each team’s schedule. This formula used to take into account how badly a team beat its opponents, but that was dropped because the mean old University of Florida was beating up on poor little Southwestern Arkansas Junior Teacher College just to pad the margin of victory.
Now, there are a few problems with this model. Most obviously, the computers are a source of continual controversy. Most of the ranking algorithms are kept private, so there’s no way to peer-review the manner in which the various computers determine their rankings. So what is the value of these algorithms? Sadly, unknown. Computer rankings can be kept, but the algorithms should be made open. These algorithms should also be called upon to carry out specific tasks — for instance, a computer whose higher-ranked schools regularly lose to lower-ranked schools should be removed or forced to change its algorithm at the end of the season.
The human ratings have been criticized for overemphasizing the results of recent games, in comparison with games from earlier in the season, and of nationally televised games, as against those only on radio or local TV. This may be true, but, as the Navy proved when it developed Monte Carlo analysis, there’s a lot of non-quantifiable knowledge in the brains of experts. Keep the human polls, but require the voters to see at least a portion of some minimum number of games featuring teams from some minimum distribution of conferences.
It was a mistake to remove margin of victory from the equation. After all, when two possible outcomes are tested, it only makes sense to measure not only the outcome but the magnitude of the outcome. Imagine, for a moment, that an executive had a choice between a course of action that would lose her company millions, and a course that would lose her company hundreds of millions. No sane individual would rank these options equally as a “lose”; clearly one is a “loss” and the other is a “disaster”. So, keep margin of victory, but take a lesson from Vegas and compare the margin to the spread. It would be easy to implement — based on last week’s games (or last year’s games, or some such sample), what is the mean of the margin of victory by a team with a similar rank to the higher-ranked team over a team with a similar rank to the lower-ranked team in the game? Then, for instance, #8 Florida would be _expected_ to beat #203 Southwestern Arkansas Junior Teacher’s College by 40 points, and would get no boost from their victory. A team that beats the mean gets a boost, a team that falls below the mean can even have points subtracted, a team on the mean gets zero because, statistically, that is what they should have won by.
Is this a perfect way to determine a champion? No. But it’s open, it takes advantage of hidden knowledge, and it measures results and compares them to norms. It’s not a bad start.