I have a dataframe:
dat<- data.frame(date = c("2015-01-01","2015-01-01","2015-01-01", "2015-01-01","2015-02-02","2015-02-02","2015-02-02","2015-02-02","2015-02-02"), val= c(10,20,30,50,300,100,200,200,400), type= c("A","A","B","C","A","A","B","C","C") )
dat
date val type
1 2015-01-01 10 A
2 2015-01-01 20 A
3 2015-01-01 30 B
4 2015-01-01 50 C
5 2015-02-02 300 A
6 2015-02-02 100 A
7 2015-02-02 200 B
8 2015-02-02 200 C
9 2015-02-02 400 C
and I would like to have one row for each day with averages by type so the output would be:
Date A B C
2015-01-01 15 30 50
2015-02-02 200 200 300
additionally how would I get the counts so the results are:
Date A B C
2015-01-01 2 1 1
2015-02-02 2 1 2
library(reshape2)
dcast(data = dat, formula = date ~ type, fun.aggregate = mean, value.var = "val")
# date A B C
# 1 2015-01-01 15 30 50
# 2 2015-02-02 200 200 300
With dcast
, the LHS of the formula defines rows, the RHS defines columns, the value.var
is the name of the column that becomes values, and the fun.aggregate
is how those values are computed. The default fun.aggregate
is length
, i.e., the number of values. You asked for the average, so we use mean
. You could also do min
, max
, sd
, IQR
, or any function that takes a vector and returns a single value.
You may also use table
for the updated question
table(dat[c(1,3)])
# type
#date A B C
#2015-01-01 2 1 1
#2015-02-02 2 1 2
For the first question, I think @Gregor's solution is the best (so far), a possible option with dplyr/tidyr
would be
library(dplyr)
library(tidyr)
dat %>%
group_by(date,type) %>%
summarise(val=mean(val)) %>%
spread(type, val)
Or a base R
option would be (nchar=50
and the dcast(..
nchar=44
. So not so bad :-))
with(dat, tapply(val, list(date, type), FUN=mean))
# A B C
#2015-01-01 15 30 50
#2015-02-02 200 200 300
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With