Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pivoting data in R

Tags:

r

I have a dataframe:

dat<- data.frame(date = c("2015-01-01","2015-01-01","2015-01-01", "2015-01-01","2015-02-02","2015-02-02","2015-02-02","2015-02-02","2015-02-02"), val= c(10,20,30,50,300,100,200,200,400), type= c("A","A","B","C","A","A","B","C","C") )
dat

       date val type
1 2015-01-01  10    A
2 2015-01-01  20    A
3 2015-01-01  30    B
4 2015-01-01  50    C
5 2015-02-02 300    A
6 2015-02-02 100    A
7 2015-02-02 200    B
8 2015-02-02 200    C
9 2015-02-02 400    C

and I would like to have one row for each day with averages by type so the output would be:

Date           A     B     C
2015-01-01    15     30    50
2015-02-02    200     200   300

additionally how would I get the counts so the results are:

Date           A     B     C
2015-01-01    2      1     1
2015-02-02    2      1     2
like image 910
user3022875 Avatar asked Nov 30 '22 17:11

user3022875


2 Answers

library(reshape2)
dcast(data = dat, formula = date ~ type, fun.aggregate = mean, value.var = "val")

#         date   A   B   C
# 1 2015-01-01  15  30  50
# 2 2015-02-02 200 200 300

With dcast, the LHS of the formula defines rows, the RHS defines columns, the value.var is the name of the column that becomes values, and the fun.aggregate is how those values are computed. The default fun.aggregate is length, i.e., the number of values. You asked for the average, so we use mean. You could also do min, max, sd, IQR, or any function that takes a vector and returns a single value.

like image 74
Gregor Thomas Avatar answered Dec 22 '22 11:12

Gregor Thomas


You may also use table for the updated question

  table(dat[c(1,3)])
  #            type
  #date       A B C
  #2015-01-01 2 1 1
  #2015-02-02 2 1 2

For the first question, I think @Gregor's solution is the best (so far), a possible option with dplyr/tidyr would be

 library(dplyr)
 library(tidyr)
 dat %>%
    group_by(date,type) %>%
    summarise(val=mean(val)) %>% 
    spread(type, val)

Or a base R option would be (nchar=50 and the dcast(.. nchar=44. So not so bad :-))

  with(dat, tapply(val, list(date, type), FUN=mean))
  #            A   B   C
  #2015-01-01  15  30  50
  #2015-02-02 200 200 300
like image 30
akrun Avatar answered Dec 22 '22 12:12

akrun