Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply function conditionally

Tags:

I have a dataframe like this:

experiment iter  results     A       1     30.0     A       2     23.0     A       3     33.3     B       1     313.0     B       2     323.0     B       3     350.0  .... 

Is there a way to tally results by applying a function with conditions. In the above example, that condition is all iterations of a particular experiment.

A   sum of results (30 + 23, + 33.3) B   sum of results (313 + 323 + 350) 

I am thinking of "apply" function, but can't find a way to get it work.

like image 668
Oliver Avatar asked May 20 '13 20:05

Oliver


1 Answers

There are a lot of alternatives to do this. Note that if you are interested in another function different from sum, then just change the argument FUN=any.function, e.g, if you want mean, var length, etc, then just plug those functions into FUN argument, e.g, FUN=mean, FUN=var and so on. Let's explore some alternatives:

aggregate function in base.

> aggregate(results ~ experiment, FUN=sum, data=DF)   experiment results 1          A    86.3 2          B   986.0 

Or maybe tapply ?

> with(DF, tapply(results, experiment, FUN=sum))     A     B   86.3 986.0  

Also ddply from plyr package

> # library(plyr) > ddply(DF[, -2], .(experiment), numcolwise(sum))   experiment results 1          A    86.3 2          B   986.0  > ## Alternative syntax > ddply(DF, .(experiment), summarize, sumResults = sum(results))   experiment sumResults 1          A       86.3 2          B      986.0 

Also the dplyr package

> require(dplyr) > DF %>% group_by(experiment) %>% summarise(sumResults = sum(results)) Source: local data frame [2 x 2]    experiment  sumResults 1          A        86.3 2          B       986.0 

Using sapply and split, equivalent to tapply.

> with(DF, sapply(split(results, experiment), sum))     A     B   86.3 986.0  

If you are concern about timing, data.table is your friend:

> # library(data.table) > DT <- data.table(DF) > DT[, sum(results), by=experiment]    experiment    V1 1:          A  86.3 2:          B 986.0 

Not so popular, but doBy package is nice (equivalent to aggregate, even in syntax!)

> # library(doBy) > summaryBy(results~experiment, FUN=sum, data=DF)   experiment results.sum 1          A        86.3 2          B       986.0 

Also by helps in this situation

> (Aggregate.sums <- with(DF, by(results, experiment, sum))) experiment: A [1] 86.3 -------------------------------------------------------------------------  experiment: B [1] 986 

If you want the result to be a matrix then use either cbind or rbind

> cbind(results=Aggregate.sums)   results A    86.3 B   986.0 

sqldf from sqldf package also could be a good option

> library(sqldf) > sqldf("select experiment, sum(results) `sum.results`       from DF group by experiment")   experiment sum.results 1          A        86.3 2          B       986.0 

xtabs also works (only when FUN=sum)

> xtabs(results ~ experiment, data=DF) experiment     A     B   86.3 986.0 
like image 99
Jilber Urbina Avatar answered Sep 20 '22 18:09

Jilber Urbina