Predicting the NBA Part #3

I never anticipated this growing into a 3 part series, but the information that has been pointed out to me since part #2 and my own wondering has caused this enterprise to expand further out( Part #1 Here/Part #2 Here). In part #2 we found that point differential was an extremely accurate predictor of NBA success based on how the graph looked and its r^2 value, but unbelievable I never bother to do a calculation attempting to find how good it really was versus real results. This made it difficult to defend when over at Arm Chair Gm (where I have been dual posting my last few articles) someone asked why I simply did not test the commonly used Pythagorean Formula (named so because it uses exponents like the Pythagorean theorem) that had been developed over the years for predicting team success. The formula basically states that if you take a team's total points scored to a power and divide it by the sum of their total points to a power and total points allowed to a power then you will get a team's win percentage within an average margin of error. The power that you must take the values to was dependent on the sport as each had a different value which best fit it. So the question for the day is, does this formula derived by past statisticians work more effectively than simply using the point differential linearization equation?

As in part #2, the resulting data sets are so large that there is no effective means to post them on the webpage itself, but every bit of the data I used is available for download as an Excel Spreadsheet here.

First, I set out to test the precision and accuracy of the point differential results using the equation that was developed during part #2. Using that method, over the course of the last 10 NBA seasons the point differential method was off by 2.36 wins per team which is very high degree of accuracy (basically a predicted 42 win team probably would win between 40-44 games). It also came out as a rather precise means of measurement as all of the data combined only yielded a standard deviation of 1.87 wins. Even dividing up the complete data into the last 5 and first 5 years of the experiment, the data is basically the same only varying at the hundredths place in the decimal. With this data in hand, we know that this is an extremely accurate predictor of team success, but without the Pythagorean formula method for comparison we have no way of knowing if it is an inferior product.

When the same point differential data is used in the Pythagorean formula system (the true formula uses total points scored/surrendered instead of per 100, but since we will find the exponent ourselves and they should be in the same basic ratio, this should not matter significantly), the results come out extremely similar to those we found above. After calculating out all of the data through the equation and finding the most accurate exponential predictor (13.231) our resulting average win difference from reality came out to be 2.396 wins per team (basically a predicted 42 win team probably would win between 40-44 games), only the slightest bit more than our other method. It was a smidgen more precise than the other measurement method though with a standard deviations of 1.77 wins. When the data was divided up into the first and last 5 years of the experiment once again, it only began to vary at the hundredths place of the decimal. This data, like the other, proved to be an extremely accurate predictor of NBA team success.

With all of this data in now, can we conclusively say that one method was any better than the other? In terms of how successful the results are, we cannot. Both in precision and accuracy the two methods only varied at the hundredths place of their decimals meaning that for all true uses, they are exactly equal in their abilities (a hundredth of a win is not going to help anyone predict anything). In terms of the effort it takes to make the predictions using the methods, I also do not see a huge difference between the two. The point differential method required me to create a quick graph and copy down an equation to use, but the Pythagorean method required me to test numerous exponents on the data. Both of these methods took a small amount of time with neither requiring more effort or skill than the other. The only possible edge that could be seen in the Pythagorean method is that once the exponent is found, it is simpler to remember its value and the equation it is used than to remember the two values of the linear equation found by point differential, but neither method would tax the brain two greatly.Therefore, it is my belief that neither calculating method is significantly more valid for predicting an NBA team's win total based on their points scored and given up. Additionally, neither is a significantly simpler or quicker means of making that determination.

With that said, the quest for prediction perfection is still far from over. It is my belief that there are ways that this data could be further manipulated to more accurately predict a team's success. I tried a couple quick tests to see if I could affect change in the data, but none provided to be very successful. I tried rewarding teams for having high PER players with a win bonus, which generally slightly improved accuracy but decreased precision. I also tried adjusting the win total based on the points scored vs. surrendered ratio divided by a various numbers, but that was a complete failure. If anyone would like to test any new ideas simply drop me a line at my e-mail tmx117@gmail.com or download the spreadsheet and give it a shot yourself. Once again, the data as always was conclusive in proving one well known fact. If you wanna win the game, score more points than the other team.

(Check Sheet 2 on the spreadsheet for whats coming up later this week)

Blog Archive