I have the following data frame "DF" which is part of a much larger one:
X1 X2 X3 X4 X5
4468 2010-03-24 3 1.000000e+00 1 2
7662 2010-03-24 9 3.000000e+00 2 1
1272 2010-03-25 8 2.000000e+00 1 1
1273 2010-03-26 9 0.000000e+00 1 1
1274 2010-03-27 8 0.000000e+00 1 1
4469 2010-03-28 4 0.000000e+00 1 2
7663 2010-03-28 4 3.000000e+00 3 1
8734 2010-03-28 7 4.000000e+00 2 3
1275 2010-03-29 8 0.000000e+00 1 1
As you can see the first column contains a date. What I want to do is as follows: I want to transform this dataframe to a new one "DF2" where there is only 1 row per date with corresponding column values:
X2, the average
X3, the sum
X4, the maximum
of all previous values per date. X5 is not relevant and can be removed. This would be the result:
X1 X2 X3 X4
7662 2010-03-24 6 4.000000e+00 2
1272 2010-03-25 8 2.000000e+00 1
1273 2010-03-26 9 0.000000e+00 1
1274 2010-03-27 8 0.000000e+00 1
8734 2010-03-28 5 7.000000e+00 3
1275 2010-03-29 8 0.000000e+00 1
Does anyone know how to accomplish this? Help would be much appreciated!
It is important to note that the function call is applied to nameless vectors rather than named columns of a data.frame and hence referring to the names of the data.frame will not work, nor will column references such as s.d.f [,1]. The most basic uses of aggregate involve base functions such as mean and sd.
The syntax of the R aggregate function will depend on the input data. There are three possible input types: a data frame, a formula and a time series object. The arguments and its description for each method are summarized in the following block:
Now, you can use the aggregate function to aggregate the sum to summarize the data frame based on the two variables: aggregate(df_2 $weight, by = list(df_2 $feed, df_2 $cat_var), FUN = sum) aggregate(weight ~ feed + cat_var, data = df_2, FUN = sum)
Hence, one can reproduce the aggregate functionality by a for cycle running the cycle variable over the unique values of the variable passed as by and an sapply applying the function passed as FUN to each column of the data.frame sub.data.frame.
DF <- read.table(text=" X1 X2 X3 X4 X5
4468 2010-03-24 3 1.000000e+00 1 2
7662 2010-03-24 9 3.000000e+00 2 1
1272 2010-03-25 8 2.000000e+00 1 1
1273 2010-03-26 9 0.000000e+00 1 1
1274 2010-03-27 8 0.000000e+00 1 1
4469 2010-03-28 4 0.000000e+00 1 2
7663 2010-03-28 4 3.000000e+00 3 1
8734 2010-03-28 7 4.000000e+00 2 3
1275 2010-03-29 8 0.000000e+00 1 1",header=TRUE)
library(data.table)
DT <- as.data.table(DF)
DT[,list(X2=mean(X2),X3=sum(X3),X4=max(X4)),by=X1]
# X1 X2 X3 X4
# 1: 2010-03-24 6 4 2
# 2: 2010-03-25 8 2 1
# 3: 2010-03-26 9 0 1
# 4: 2010-03-27 8 0 1
# 5: 2010-03-28 5 7 3
# 6: 2010-03-29 8 0 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With