<p>I want to calculate the mean for each "Day" but for a portion of the day (Time=12-14). This code works for me but I have to enter each day as a new line of code, which will amount to hundreds of lines. </p> <p>This seems like it should be simple to do. I've done this easily when the grouping variables are the same but dont know how to do it when I dont want to include all values for the day. Is there a better way to do this?</p> <pre class="prettyprint"><code>sapply(sap[sap$Day==165 & sap$Time %in% c(12,12.1,12.2,12.3,12.4,12.5,13,13.1,13.2,13.3,13.4,13.5, 14), ],mean) sapply(sap[sap$Day==166 & sap$Time %in% c(12,12.1,12.2,12.3,12.4,12.5,13,13.1,13.2,13.3,13.4,13.5, 14), ],mean) </code></pre> <p>Here's what the data looks like:</p> <pre class="prettyprint"><code>Day Time StomCond_Trunc 165 12 33.57189926 165 12.1 50.29437636 165 12.2 35.59876214 165 12.3 24.39879768 </code></pre>

<p>If you have a large dataset, you may also want to look into the <code>data.table</code> package. Converting a <code>data.frame</code> to a <code>data.table</code> is quite easy. </p> <p>Example:</p> <h3>Large(ish) dataset</h3> <pre class="prettyprint"><code>df <- data.frame(Day=1:1000000,Time=sample(1:14,1000000,replace=T),StomCond_Trunc=rnorm(100000)*20) </code></pre> <h3>Using aggregate on the <code>data.frame</code> </h3> <pre class="prettyprint"><code>>system.time(aggregate(StomCond_Trunc~Day,data=subset(df,Time>=12 & Time<=14),mean)) user system elapsed 16.255 0.377 24.263 </code></pre> <h3>Converting it to a <code>data.table</code> </h3> <pre class="prettyprint"><code> dt <- data.table(df,key="Time") >system.time(dt[Time>=12 & Time<=14,mean(StomCond_Trunc),by=Day]) user system elapsed 9.534 0.178 15.270 </code></pre> <hr> <p><strong>Update from Matthew</strong>. This timing has improved dramatically since originally answered due to a new optimization feature in data.table 1.8.2.</p> <p>Retesting the difference between the two approaches, using data.table 1.8.2 in R 2.15.1 :</p> <pre class="prettyprint"><code>df <- data.frame(Day=1:1000000, Time=sample(1:14,1000000,replace=T), StomCond_Trunc=rnorm(100000)*20) system.time(aggregate(StomCond_Trunc~Day,data=subset(df,Time>=12 & Time<=14),mean)) # user system elapsed # 10.19 0.27 10.47 dt <- data.table(df,key="Time") system.time(dt[Time>=12 & Time<=14,mean(StomCond_Trunc),by=Day]) # user system elapsed # 0.31 0.00 0.31 </code></pre>

R: Calculate means for subset of a group

Tags:

r

aggregate

data.table

I want to calculate the mean for each "Day" but for a portion of the day (Time=12-14). This code works for me but I have to enter each day as a new line of code, which will amount to hundreds of lines.

This seems like it should be simple to do. I've done this easily when the grouping variables are the same but dont know how to do it when I dont want to include all values for the day. Is there a better way to do this?

sapply(sap[sap$Day==165 & sap$Time %in% c(12,12.1,12.2,12.3,12.4,12.5,13,13.1,13.2,13.3,13.4,13.5, 14), ],mean)

sapply(sap[sap$Day==166 & sap$Time %in% c(12,12.1,12.2,12.3,12.4,12.5,13,13.1,13.2,13.3,13.4,13.5, 14), ],mean)

Here's what the data looks like:

Day Time    StomCond_Trunc
165 12      33.57189926
165 12.1    50.29437636
165 12.2    35.59876214
165 12.3    24.39879768

332

asked Feb 18 '12 16:02

steph

1 Answers

If you have a large dataset, you may also want to look into the data.table package. Converting a data.frame to a data.table is quite easy.

Example:

Large(ish) dataset

df <- data.frame(Day=1:1000000,Time=sample(1:14,1000000,replace=T),StomCond_Trunc=rnorm(100000)*20)

Using aggregate on the `data.frame`

>system.time(aggregate(StomCond_Trunc~Day,data=subset(df,Time>=12 & Time<=14),mean))
   user  system elapsed 
 16.255   0.377  24.263

Converting it to a `data.table`

 dt <- data.table(df,key="Time")

>system.time(dt[Time>=12 & Time<=14,mean(StomCond_Trunc),by=Day])
   user  system elapsed 
  9.534   0.178  15.270

Update from Matthew. This timing has improved dramatically since originally answered due to a new optimization feature in data.table 1.8.2.

Retesting the difference between the two approaches, using data.table 1.8.2 in R 2.15.1 :

df <- data.frame(Day=1:1000000,
                 Time=sample(1:14,1000000,replace=T),
                 StomCond_Trunc=rnorm(100000)*20)
system.time(aggregate(StomCond_Trunc~Day,data=subset(df,Time>=12 & Time<=14),mean)) 
#   user  system elapsed 
#  10.19    0.27   10.47

dt <- data.table(df,key="Time") 
system.time(dt[Time>=12 & Time<=14,mean(StomCond_Trunc),by=Day]) 
#   user  system elapsed 
#   0.31    0.00    0.31

176

answered Sep 25 '22 02:09

Maiasaura

Related questions
                            
                                Reducing spacing between lines when using atop
                            
                                How to include NA data in a table
                            
                                Dynamic variable names in R regressions
                            
                                How to recode a range of rows in between two specific values
                            
                                How to trim white spaces when trimws is not working?
                            
                                How to draw a point in polar coordinates with negative r?
                            
                                "Hmisc" package or namespace failed to load - no package called 'latticeExtra'
                            
                                Is it possible to draw the axis line first, before the data?
                            
                                Correlation clustering in R
                            
                                Getting the contents of a library interactively in R
                            
                                predict.svm does not predict new data
                            
                                Changing user agent string in a http request in R
                            
                                What is the best practice of handling time series in R?
                            
                                3d scatterplot with colored spheres with R and Rgl
                            
                                Reading in only part of a Stata .DTA file in R
                            
                                ddply aggregated column names
                            
                                Digging into R profiling information
                            
                                ggplot2: how to adjust line types + order in legend?
                            
                                Capture last output as an R object [duplicate]
                            
                                Trouble loading wordnet package in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

R: Calculate means for subset of a group

Tags:

r

aggregate

data.table

steph

People also ask

1 Answers

Large(ish) dataset

Using aggregate on the `data.frame`

Converting it to a `data.table`

Maiasaura

Recent Activity

Donate For Us

R: Calculate means for subset of a group

Tags:

r

aggregate

data.table

steph

People also ask

1 Answers

Large(ish) dataset

Using aggregate on the data.frame

Converting it to a data.table

Maiasaura

Related questions

Recent Activity

Donate For Us

Using aggregate on the `data.frame`

Converting it to a `data.table`