Minggu, 20 Februari 2011

Sunday Numbers 2.0, Vol. 1: Return to football and mortality.

Over the past few weeks, I've received e-mails from readers saying they missed the Wednesday Math posts. I was a little surprised, but I did do 130 of them, so I could see how folks felt like it was part of the routine.

Right now, prepping classes is cutting into my precious blogging time during the week, so I am going to resurrect the Wednesday tradition on Sundays instead, calling it Sunday Numbers 2.0 to distinguish these new posts from the first set of Sunday Numbers in 2008 that used my system called Confidence of Victory to predict the result of the presidential election, predictions that were remarkably close to the actual landslide electoral victory of Obama over McCain, thank Odin, Krishna and the li'l baby Jesus.


About a year ago, I did a couple posts on professional sports and mortality. This week, a fellow named Jim Zimmerman who runs oldestlivingprofootball.com added a comment to the thread from last year, so I gave his site a visit.

A website full of numerical data sorted in an easy to understand way.

Honestly, I couldn't be happier if you sent me beer.

I took a couple hours to get the data, change the dates of birth and death to years instead of specific days and sorted it both by year of death and year of birth. Here are some of my early findings.



How has the Age of Steroids effected mortality of NFL players? This chart shows the average age of players who died in the years from 1980 to 2010. The general public started noticing steroid use in the late 1990s in baseball, but the premature deaths of John Matuszak and Lyle Alzado a decade earlier made steroid use in football a topic of conversation then. I took the average age of football players who died in the years from 1980 to 2010, and as we can see, the general trend is upwards, as it is for the public in general.

There is a fact that skews the in favor of longer life expectancy in more recent years that has nothing to do with improved health. More football players are living to be 90 or more because there more professionals as time goes on. If a man died in 1980 and he was more than 90, he had to be born in 1890 or earlier. If a man in his nineties died last year, he was born between 1910 and 1920, and probably played football in the 1940's or 1950's. There are more football players in that era than in the earlier era simply because the league started in the early 1920s.


Are more football players dying young as we move forward in time? Again, I looked at the years of death 1980 to 2010. If someone died in 1980, they likely played the game in the era from 1950 to 1980, which was not the age of steroids. For those who died young in 2010, their playing days would have been in the era of 1980 to 2010 when steroid use is assumed to be more prevalent. Looking at the graph to the left, we see the percentages of NFL players dying before the age of 50 fluctuates quite a bit year by year and the trendline (or line of regression) is almost flat. More than that, the correlation coefficient is incredibly weak, so the data does NOT let us state that steroid use has been a significant factor in premature death of the population of NFL players.

How do the ages at death of NFL players compare to the general population? For this question, I needed some sample from the general population that would be fair to compare to the list of NFL players who died in 2010. My method was to look at recent obituaries from the Associated Press. Both these data sets would be very different from a list of the deceased at a hospital because neither of the lists of celebrated people are going to have any infant deaths or deaths of teenagers. In the A.P. obituaries, I excluded anyone whose celebrity was being the Oldest Living Person, and I only took the obits that mentioned the age at death in the first paragraph. Here are the statistics I used for my tests.

2010 deaths of former NFL players
n = 134, average = 73.28, standard deviation = 16.82
% under 50 = 10.4%
% between 50 to 59 = 9.0%
% between 60 to 69 = 19.4%
% between 70 to 79 = 16.4%
% between 80 to 89 = 32.8%
% over 90 = 11.9%

100 deaths from A.P. obituaries, late 2010 to early 2011
n = 100, average = 77.65, standard deviation = 14.99
% under 50 = 5%
% between 50 to 59 = 7%
% between 60 to 69 = 13%
% between 70 to 79 = 23%
% between 80 to 89 = 28%
% over 90 = 24%

5% of celebrities who died were under 50 compared to 10.8% of football players. Is that significant? Good question, hypothetical question asker. With sample sizes this small and splitting into two groups for each set, under 50 and 50 or over, the chi-square test does not give us a test statistic that reaches even the 90% significance level. (test stat = 2.278, 90% threshold = 2.706.)

If instead we do a chi-square test and split both data sets into six categories, (Under 50, 50-59, 60-69, 70-79, 80-89, 90 and over), we get a test stat that does cross the 90% threshold, but not the 95%. (test stat = 10.368, 90% threshold = 9.236, 95% threshold = 11.071). The categories that add the most to the test stat are the over 90 numbers, and this can be at least in part attributed to league expansion. In 1960, the American Football League began, effectively doubling the number of professional football teams. When the leagues merge in 1970, there were 26 teams. There are now 32, but this increase is not as significant as the big jump ten years earlier.

The general celebrity list had an average age at death four years higher than the NFL list from 2010. Is that difference significant? Yes, it is. The test statistic t = 2.093 does get above the 95% significance threshold. Part of this is because of the greater percentage of celebrities dying over the age of 90 than football players over 90 dying, and again that can be partly attributed to league expansion. If instead we try to factor this out by looking at only the deaths at ages of 89 or less, of course the average ages of both groups go down dramatically. Here are the new numbers for those two data sets.

2010 deaths of former NFL players, 89 and younger
n = 119, average = 70.52, standard deviation = 16.0

deaths from A.P. obituaries, late 2010 to early 2011, 89 and younger
n = 76, average = 72.45, standard deviation = 13.32

Besides the averages going down, the difference goes from 4 to 2, the data sets get smaller and the standard deviations shrink slightly. The shrinking standard deviations would tend to increase the test statistic, but that small pressure to go up is overwhelmed by the smaller difference and sample sizes. The test statistic t = 0.911, which is not statistically significant at all.

What if we leave the NFL players alone and remove half the over 90 deaths from the A.P. list? Hypothetical, that's not a bad idea of how to adjust the data to compensate for league expansion. Let's give that a shot.

2010 deaths of former NFL players
n = 134, average = 73.28, standard deviation = 16.82

100 deaths from A.P. obituaries, late 2010 to early 2011, half the over 90s removed
n = 88, average = 75.65, standard deviation = 14.66

The new test stat is t = 1.11, not statistically significant at these data set sizes.

==

I want to thank Jim Zimmerman once again for maintaining this very nice website for mortality statistics of former professional football players. I still have all the info in an Excel file, so there may be more data mining in the future for my Sunday Numbers 2.0 posts.

Whew, that's lotsa 'splainin'. Glad it's a Sunday.

Next week: Perpendicular.

Tidak ada komentar:

Posting Komentar