Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate data frame by date and apply different functions to corresponding columns?

I have the following data frame "DF" which is part of a much larger one:

             X1  X2            X3 X4 X5
4468 2010-03-24   3  1.000000e+00  1  2
7662 2010-03-24   9  3.000000e+00  2  1
1272 2010-03-25   8  2.000000e+00  1  1
1273 2010-03-26   9  0.000000e+00  1  1
1274 2010-03-27   8  0.000000e+00  1  1
4469 2010-03-28   4  0.000000e+00  1  2
7663 2010-03-28   4  3.000000e+00  3  1
8734 2010-03-28   7  4.000000e+00  2  3
1275 2010-03-29   8  0.000000e+00  1  1

As you can see the first column contains a date. What I want to do is as follows: I want to transform this dataframe to a new one "DF2" where there is only 1 row per date with corresponding column values:

X2, the average 
X3, the sum
X4, the maximum

of all previous values per date. X5 is not relevant and can be removed. This would be the result:

             X1  X2            X3 X4
7662 2010-03-24   6  4.000000e+00  2  
1272 2010-03-25   8  2.000000e+00  1  
1273 2010-03-26   9  0.000000e+00  1  
1274 2010-03-27   8  0.000000e+00  1  
8734 2010-03-28   5  7.000000e+00  3  
1275 2010-03-29   8  0.000000e+00  1  

Does anyone know how to accomplish this? Help would be much appreciated!

like image 594
MB123 Avatar asked May 13 '13 16:05

MB123


People also ask

What is the use of aggregate function in Dataframe?

It is important to note that the function call is applied to nameless vectors rather than named columns of a data.frame and hence referring to the names of the data.frame will not work, nor will column references such as s.d.f [,1]. The most basic uses of aggregate involve base functions such as mean and sd.

What is the syntax of the your aggregate function?

The syntax of the R aggregate function will depend on the input data. There are three possible input types: a data frame, a formula and a time series object. The arguments and its description for each method are summarized in the following block:

How do you sum two variables in a data frame?

Now, you can use the aggregate function to aggregate the sum to summarize the data frame based on the two variables: aggregate(df_2 $weight, by = list(df_2 $feed, df_2 $cat_var), FUN = sum) aggregate(weight ~ feed + cat_var, data = df_2, FUN = sum)

How to reproduce the aggregate functionality in a for cycle?

Hence, one can reproduce the aggregate functionality by a for cycle running the cycle variable over the unique values of the variable passed as by and an sapply applying the function passed as FUN to each column of the data.frame sub.data.frame.


1 Answers

DF <- read.table(text="             X1  X2            X3 X4 X5
4468 2010-03-24   3  1.000000e+00  1  2
7662 2010-03-24   9  3.000000e+00  2  1
1272 2010-03-25   8  2.000000e+00  1  1
1273 2010-03-26   9  0.000000e+00  1  1
1274 2010-03-27   8  0.000000e+00  1  1
4469 2010-03-28   4  0.000000e+00  1  2
7663 2010-03-28   4  3.000000e+00  3  1
8734 2010-03-28   7  4.000000e+00  2  3
1275 2010-03-29   8  0.000000e+00  1  1",header=TRUE)

library(data.table)

DT <- as.data.table(DF)

DT[,list(X2=mean(X2),X3=sum(X3),X4=max(X4)),by=X1]

#            X1 X2 X3 X4
# 1: 2010-03-24  6  4  2
# 2: 2010-03-25  8  2  1
# 3: 2010-03-26  9  0  1
# 4: 2010-03-27  8  0  1
# 5: 2010-03-28  5  7  3
# 6: 2010-03-29  8  0  1
like image 88
Roland Avatar answered Nov 07 '22 18:11

Roland