I have a dataset whose headers look like so: <pre class="prettyprint"><code>PID Time Site Rep Count </code></pre> I want sum the <code>Count</code> by <code>Rep</code> for each <code>PID x Time x Site combo</code> on the resulting data.frame, I want to get the mean value of <code>Count</code> for <code>PID x Time x Site</code> combo. Current function is as follows: <pre class="prettyprint"><code>dummy <- function (data) { A<-aggregate(Count~PID+Time+Site+Rep,data=data,function(x){sum(na.omit(x))}) B<-aggregate(Count~PID+Time+Site,data=A,mean) return (B) } </code></pre> This is painfully slow (original data.frame is <code>510000 20)</code>. Is there a way to speed this up with plyr?

You should look at the package <code>data.table</code> for faster aggregation operations on large data frames. For your problem, the solution would look like: <pre class="prettyprint"><code>library(data.table) data_t = data.table(data_tab) ans = data_t[,list(A = sum(count), B = mean(count)), by = 'PID,Time,Site'] </code></pre>

How does one aggregate and summarize data quickly?

Tags:

r

data.table

plyr

I have a dataset whose headers look like so:

PID Time Site Rep Count

I want sum the Count by Rep for each PID x Time x Site combo

on the resulting data.frame, I want to get the mean value of Count for PID x Time x Site combo.

Current function is as follows:

dummy <- function (data)
{
A<-aggregate(Count~PID+Time+Site+Rep,data=data,function(x){sum(na.omit(x))})
B<-aggregate(Count~PID+Time+Site,data=A,mean)
return (B)
}

This is painfully slow (original data.frame is 510000 20). Is there a way to speed this up with plyr?

685

asked Oct 11 '11 07:10

Maiasaura

1 Answers

You should look at the package data.table for faster aggregation operations on large data frames. For your problem, the solution would look like:

library(data.table)
data_t = data.table(data_tab)
ans = data_t[,list(A = sum(count), B = mean(count)), by = 'PID,Time,Site']

154

answered Dec 18 '22 08:12

Ramnath

Related questions
                            
                                How to rotate the x-axis labels 90 degrees in levelplot
                            
                                R igraph convert parallel edges to weight attribute
                            
                                How to preProcess features when some of them are factors?
                            
                                Moving average of previous three values in R
                            
                                Invalid .internal.selfref in data.table
                            
                                About GForce in data.table 1.9.2
                            
                                How to display a busy indicator in a shiny app?
                            
                                Calculate the mean of every 13 rows in data frame
                            
                                Dataset in base R with missing values
                            
                                How to use R package "formattable" in shiny dashboard?
                            
                                Paste variable name in mutate (dplyr)
                            
                                Evaluate call that contains another call (call within call)
                            
                                Plotting vectors in a coordinate system with R or python
                            
                                escaping pipe ("|") in a regex
                            
                                How do I plot the following in R?
                            
                                How can I make rJava use the newer version of java on osx?
                            
                                How to generate bin frequency table in R?
                            
                                Getting geom_tile to draw square rather than rectangular cells
                            
                                R4DS error comparison (1) is possible only for atomic and list types
                            
                                How do you create a bar plot for two variables mirrored across the x-axis in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With