I'm sure this question has been posed before, but would like some input on my specific question. In return for your help, I'll use an interesting example.
Sean Lahman provides giant datasets of MLB baseball statistics, available free on his website (http://www.seanlahman.com/baseball-archive/statistics/).
I'd like to use this data to answer the following question: What is the average number of home runs per game recorded for each decade in the MLB?
Below I've pasted all relevant script:
teamdata = read.csv("Teams.csv", header = TRUE)
decades = c(1870,1880,1890,1900,1910,1920,1930,1940,1950,1960,1970,1980,1990,2000,2010,2020)
i = 0
meanhomers = c()
for(i in c(1:length(decades))){
meanhomers[i] = mean(teamdata$HR[teamdata$yearID>=decades[i] & teamdata$yearID<decades[i+1]]);
i = i+1
}
My primary question is, how could this answer have been determined without resorting to the dreaded for-loop?
Side question: What simple script would have generated the decades vector for me?
(For those interested in the answer to the baseball question, see below.)
meanhomers
[1] 4.641026 23.735849 34.456522 20.421053 25.755682 61.837500 84.012500
[8] 80.987500 130.375000 132.166667 120.093496 126.700000 148.737410 173.826667
[15] 152.973333 NaN
Edit for clarity: Turns out I answered the wrong question; the answer provided above indicates the number of home runs per team per year, not per game. A little fix of the denominator would get the correct result.
Here's a data.table
example. Because others showed how to use cut
, I took another route for splitting the data into decades:
teamdata[,list(HRperYear=mean(HR)),by=10*floor((yearID)/10)]
However, the original question mentions average HRs per game, not per year (though the code and answers clearly deal with HRs per year).
Here's how you could compute average HRs per game (and average games per team per year):
teamdata[,list(HRperYear=mean(HR),HRperGame=sum(HR)/sum(G),games=mean(G)),by=10*floor(yearID/10)]
floor HRperYear HRperGame games
1: 1870 4.641026 0.08911866 52.07692
2: 1880 23.735849 0.21543555 110.17610
3: 1890 34.456522 0.25140108 137.05797
4: 1900 20.421053 0.13686067 149.21053
5: 1910 25.755682 0.17010657 151.40909
6: 1920 61.837500 0.40144445 154.03750
7: 1930 84.012500 0.54593453 153.88750
8: 1940 80.987500 0.52351325 154.70000
9: 1950 130.375000 0.84289640 154.67500
10: 1960 132.166667 0.81977946 161.22222
11: 1970 120.093496 0.74580935 161.02439
12: 1980 126.700000 0.80990313 156.43846
13: 1990 148.737410 0.95741873 155.35252
14: 2000 173.826667 1.07340167 161.94000
15: 2010 152.973333 0.94427984 162.00000
(The low average game totals in the 1980's and 1990's are due to the 1981 and 1994-5 player strikes).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With