Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoid For-Loops in R

Tags:

for-loop

r

I'm sure this question has been posed before, but would like some input on my specific question. In return for your help, I'll use an interesting example.

Sean Lahman provides giant datasets of MLB baseball statistics, available free on his website (http://www.seanlahman.com/baseball-archive/statistics/).

I'd like to use this data to answer the following question: What is the average number of home runs per game recorded for each decade in the MLB?

Below I've pasted all relevant script:

teamdata = read.csv("Teams.csv", header = TRUE)

decades = c(1870,1880,1890,1900,1910,1920,1930,1940,1950,1960,1970,1980,1990,2000,2010,2020)

i = 0
meanhomers = c()
for(i in c(1:length(decades))){
    meanhomers[i] = mean(teamdata$HR[teamdata$yearID>=decades[i] & teamdata$yearID<decades[i+1]]);
    i = i+1
}

My primary question is, how could this answer have been determined without resorting to the dreaded for-loop?

Side question: What simple script would have generated the decades vector for me?

(For those interested in the answer to the baseball question, see below.)

meanhomers
 [1]   4.641026  23.735849  34.456522  20.421053  25.755682  61.837500  84.012500
 [8]  80.987500 130.375000 132.166667 120.093496 126.700000 148.737410 173.826667
[15] 152.973333   NaN

Edit for clarity: Turns out I answered the wrong question; the answer provided above indicates the number of home runs per team per year, not per game. A little fix of the denominator would get the correct result.

like image 376
adyo4552 Avatar asked Dec 05 '22 01:12

adyo4552


1 Answers

Here's a data.table example. Because others showed how to use cut, I took another route for splitting the data into decades:

teamdata[,list(HRperYear=mean(HR)),by=10*floor((yearID)/10)]

However, the original question mentions average HRs per game, not per year (though the code and answers clearly deal with HRs per year).

Here's how you could compute average HRs per game (and average games per team per year):

teamdata[,list(HRperYear=mean(HR),HRperGame=sum(HR)/sum(G),games=mean(G)),by=10*floor(yearID/10)]

    floor  HRperYear  HRperGame     games
 1:  1870   4.641026 0.08911866  52.07692
 2:  1880  23.735849 0.21543555 110.17610
 3:  1890  34.456522 0.25140108 137.05797
 4:  1900  20.421053 0.13686067 149.21053
 5:  1910  25.755682 0.17010657 151.40909
 6:  1920  61.837500 0.40144445 154.03750
 7:  1930  84.012500 0.54593453 153.88750
 8:  1940  80.987500 0.52351325 154.70000
 9:  1950 130.375000 0.84289640 154.67500
10:  1960 132.166667 0.81977946 161.22222
11:  1970 120.093496 0.74580935 161.02439
12:  1980 126.700000 0.80990313 156.43846
13:  1990 148.737410 0.95741873 155.35252
14:  2000 173.826667 1.07340167 161.94000
15:  2010 152.973333 0.94427984 162.00000

(The low average game totals in the 1980's and 1990's are due to the 1981 and 1994-5 player strikes).

like image 93
BCC Avatar answered Dec 26 '22 02:12

BCC