Predicting the NFL: Part #1

Over the course of the last couple weeks, I have been collecting data concerning the last 10 years of the NBA in order to find a good means of predicting a team's win total for a season. With much of that laid to rest, I began to wonder how the things that were discovered in the NBA would work when applied to another professional sports league, the NFL.

My goal with this study is to determine a method for accurately predicting a teams win and loss record for a season based on simple statistics that can be pulled from any sport's reference site. In order to limit the scope of the study, I choose to take the two methods that I found most effective in the NBA and apply them as best as I could to the NFL using the last 10 years results for every team as my sample. The two methods that will be used as predictors to be tested are the Pythagorean method and the point differential equation method.

The first method we will be using, the Pythagorean Method, is based on a time tested equation developed by many of the forefathers of sport's statistical analysis. What is basically states is that a team's win percentage will be about equal to

(Points Scored)^X/((Points Scored)^X+(Points Surrendered)^X)

where X is a variable that changes depending on what sport is being tested. In order to get the best results possible and since there is not a current accepted number for professional football that has undergone rigorous testing, we will determine X ourselves during the study by allowing it to be the number which gives the best overall result. During the NBA study, this method proved slightly less accurate but slightly more precise then our next method of testing.

The point differential equation method is developed by taking all the point differentials for all of the teams over the last 10 years and graphing them against the number of wins each of the given teams had. Once that is down a linear relationship will be found inside the data as best as possible, and the resulting linear equation will be used to predict, based on the data, how many wins a team should have for a given point differential. We will find this equation ourselves during the course of the study and then use it to test our data. Using this method in the NBA study, the results were slightly more accurate but slightly less precise. Overall, neither of the methods was found to be significantly better than the other in any real way.

With all or points and win totals in the spreadsheet, the first step that was taken was to create the graph which would provide for us the linear relation we will use for the point differential method. The graph turned out as below:

Our resulting linearization of the data produced an equation that stated Wins=.0281(Point Differential)+8.005 with an R^2 value of .8435 (the NBA's was about .93 so the NFL's is slightly less of a good fit, but it is still a decent fit). With this equation in the hand, we can now predict how many games each team over the last 10 years should have won based on their point differential.

Using the point differential equation, over the last 10 years the average prediction for a team's win total was off by .976 wins. Basically this means that normally an 8-8 team would be predicted in the range of 7-9 to 9-7 which is very good for simply using one number to predict the entirety of team success. Even one blowout of demolishing of a team could skew the numbers up or down so it is very difficult to expect more than we received from this data test. Precision wise, the entirety of the data came out with a standard deviation of .827 wins which means that is was very precise data. With these numbers in mind, we must view the Pythagorean results to see if these are the cream of the crop or if it is just a less substitute when a better method is not available.

With all of the data in the spreadsheet and it set to create the complete average of all of the results, I began to adjust the exponent until I got the best results possible. After all the adjusting was done the best exponent came out to be 2.6, which resulted in an average win difference of .965 wins per team, which is just slightly better than our result using the other method. Its standard deviation was also better than that of the other method as it came in at .708 wins which was about .1 wins better than the point differential method, so it is more precise than the other method. With all of this data in mind, does the fact that the results came in this way really mean that the Pythagorean Method is superior the point differential method?

In my opinion based on seeing the data and the difficulty of doing the work to find each of the results, I would give a very qualified yes. Pythagorean came out to be the more accurate and precise method over the last 10 years even when the data is divided into the first and last 5 years of the study. The point differential was the best measure in only 4 of the 10 years and always by extremely thin margins (in 1 measure they were better by one one thousandth), whereas many times the Pythagorean method was better by up to .3 wins. In simple data measurement, though, point differential cannot necessarily be statistically shown to be an actually worse method within margin of error, but the last little bit that pushed it into the yes territory is the work required to find it. Finding the exponent takes all of 1 or 2 minutes if you have set your spreadsheet up correctly and once that is found it is easy to test more future and past numbers in it. The point differential method requires some additional spreadsheet set-up and requires graphing, which simply takes up more time as well. In addition, the Pythagorean method is simpler to remember as the non-exponent part and one number, the exponent, that must be memorized are both very simple. On the other hand, the linear equation while not hard to remember in its form involves two longer decimals to remember. Its not a huge deal, but if you are quickly trying to get calculations done, Pythagorean is quicker and according to what we have unsignificantly proven, more accurate.

In conclusion, what the data does conclusively show is that both of these methods are extremely effective at taking a singular statistical measure (or two, depending on how you read it/counted it) and predicting how a team's season will turn out. There may be ways to adjust these numbers to come up with a statistically better method of predicting their win total, but I have not happened upon it yet as I have worked. If anyone reader has found a better method and would like to talk about it here/tell me about it and let me talk about it, drop me a line at tmx117@gmail.com and tell me all about it. Tomorrow, I will delve deeper into the current numbers to come up with a list of numerous observations that can be made from it, but until then, don't forget to carry the one (HAHA! not funny at all).

(Note: If you have an interest in playing with the numbers I used for this study, the exact spreadsheet I used is available here.)

Blog Archive