Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a pivot table in R with multiple (3+) variables

Tags:

r

reshape

I am having problems in create a pivot table with a data frame like this:

c1   c2          c3         c4
E   5.76         201    A la vista
E   47530.71     201    A la vista
E   82.85        201    A la vista
L   11376.55     201    A la vista
E   6683.37      203    A la vista
E   66726.52     203    A la vista
E   2.39         203    A la vista
E   79066.07     202    Montoxv_a60d
E   14715.71     202    Montoxv_a60d
E   22661.78     202    Montoxv_a60d
L   81146.25     124    Montoxv_a90d
L   471730.2     124    Montoxv_a186d
E   667812.84    124    Montoxv_a186d

My problem is that I don't know how to create in R a pivot table or summary table with four variables, considering for the final table in the rows, the levels of c1 and c3 and as columns the levels of c4. The values of c2 variable must be aggregated by sum for each level considered in rows. I would like to get something like this:

       A la vista   Montoxv_a60d   Montoxv_a186d  Montoxv_a90d
E 201    47619.32       0               0               0  
E 203    73412.28       0               0               0 
E 202    0           116443.56          0               0      
E 124    0              0            667812.84          0 
L 201    11376.55       0               0               0
L 124    0              0            471730.2         81146.25 
like image 215
Duck Avatar asked Sep 04 '13 20:09

Duck


4 Answers

You can do this with dcast from the reshape2 package:

dcast(mydata, c1 + c3 ~ c4, value.var="c2", fun.aggregate=sum)

For example:

library(reshape2)
# reproducible version of your data
mydata = read.csv(text="c1,c2,c3,c4
    E,5.76,201,A la vista
    E,47530.71,201,A la vista
    E,82.85,201,A la vista
    L,11376.55,201,A la vista
    E,6683.37,203,A la vista
    E,66726.52,203,A la vista
    E,2.39,203,A la vista
    E,79066.07,202,Montoxv_a60d
    E,14715.71,202,Montoxv_a60d
    E,22661.78,202,Montoxv_a60d
    L,81146.25,124,Montoxv_a90d
    L,471730.2,124,Montoxv_a186d
    E,667812.84,124,Montoxv_a186d", header=TRUE)
result = dcast(mydata, c1 + c3 ~ c4, value.var="c2", fun.aggregate=sum)

produces:

  c1  c3 A la vista Montoxv_a186d Montoxv_a60d Montoxv_a90d
1  E 124       0.00      667812.8          0.0         0.00
2  E 201   47619.32           0.0          0.0         0.00
3  E 202       0.00           0.0     116443.6         0.00
4  E 203   73412.28           0.0          0.0         0.00
5  L 124       0.00      471730.2          0.0     81146.25
6  L 201   11376.55           0.0          0.0         0.00
like image 85
David Robinson Avatar answered Sep 28 '22 02:09

David Robinson


This can also be easily produced by the pivottabler package - using either the one-line quick-pivot function or the more verbose syntax:

df <- read.csv(text="c1,c2,c3,c4
    E,5.76,201,A la vista
    E,47530.71,201,A la vista
    E,82.85,201,A la vista
    L,11376.55,201,A la vista
    E,6683.37,203,A la vista
    E,66726.52,203,A la vista
    E,2.39,203,A la vista
    E,79066.07,202,Montoxv_a60d
    E,14715.71,202,Montoxv_a60d
    E,22661.78,202,Montoxv_a60d
    L,81146.25,124,Montoxv_a90d
    L,471730.2,124,Montoxv_a186d
    E,667812.84,124,Montoxv_a186d", header=TRUE)

# quick pivot syntax
library(pivottabler)
qhpvt(df, c("c1","c3"), "c4", "sum(c2)", totals="NONE")

# verbose syntax
library(pivottabler)
pt <- PivotTable$new()
pt$addData(df) 
pt$addColumnDataGroups("c4", addTotal=FALSE)
pt$addRowDataGroups("c1", addTotal=FALSE)
pt$addRowDataGroups("c3", addTotal=FALSE)
pt$defineCalculation(calculationName="calc1", summariseExpression="sum(c2)")
pt$renderPivot()

Output:

output

More info about the pivottabler package at: http://pivottabler.org.uk/articles/v01-introduction.html

NB: I am the package author.

like image 39
cbailiss Avatar answered Sep 28 '22 02:09

cbailiss


The request: "The values of c2 variable must be aggregated by sum for each level" can be decomposed into an aggregation step that use tapply to create a 3-dimensional table object which can then be presented as flattened (with ftable) with 2 dimensions on the rows and one dimension for columns:

ftable(with(mydata, tapply(c2, list(c1,c3,c4), sum) ) )

           A la vista Montoxv_a186d Montoxv_a60d Montoxv_a90d
                                                             
    E 124          NA     667812.84           NA           NA
      201    47619.32            NA           NA           NA
      202          NA            NA    116443.56           NA
      203    73412.28            NA           NA           NA
    L 124          NA     471730.20           NA     81146.25
      201    11376.55            NA           NA           NA
      202          NA            NA           NA           NA
      203          NA            NA           NA           NA
like image 38
IRTFM Avatar answered Sep 28 '22 02:09

IRTFM


Here are a few more options, two in base R and one using the more recent "dplyr" and "tidyr" packages.

Base R's reshape can't handle aggregation, so you need to resort to other functions (for example, aggregate) before you can do the reshaping.

reshape(
  aggregate(c2 ~ c1 + c3 + c4, mydata, sum), 
  direction = "wide", idvar = c("c1", "c3"), timevar = "c4")
#      c1  c3 c2.A la vista c2.Montoxv_a186d c2.Montoxv_a60d c2.Montoxv_a90d
# 1     E 201      47619.32               NA              NA              NA
# 2     L 201      11376.55               NA              NA              NA
# 3     E 203      73412.28               NA              NA              NA
# 4     E 124            NA         667812.8              NA              NA
# 5     L 124            NA         471730.2              NA        81146.25
# 6     E 202            NA               NA        116443.6              NA

If your aggregation only involves sum, you can also use xtabs to do the aggregation. Since you have multiple values on the RHS of the formula, you'll end up with a multi-dimensional array, but that can easily be coerced into a rectangular form by using ftable (as was done by @BondedDust in his answer). Note that the output using ftable is a little different from others in that it returns all combinations of grouping variables by default, even where there would otherwise be fully empty rows.

ftable(xtabs(c2 ~ c1 + c3 + c4, mydata))
#           c4 A la vista Montoxv_a186d Montoxv_a60d Montoxv_a90d
# c1    c3                                                       
#     E 124          0.00     667812.84         0.00         0.00
#       201      47619.32          0.00         0.00         0.00
#       202          0.00          0.00    116443.56         0.00
#       203      73412.28          0.00         0.00         0.00
#     L 124          0.00     471730.20         0.00     81146.25
#       201      11376.55          0.00         0.00         0.00
#       202          0.00          0.00         0.00         0.00
#       203          0.00          0.00         0.00         0.00

Finally, you can also use the functions in "tidyr" and "dplyr" which offer similar functionality to the tools in "reshape" and "reshape2", but with a slightly different "grammar".

library(tidyr)
library(dplyr)
mydata %>%                     ## The source dataset
  group_by(c1, c3, c4) %>%     ## Grouping variables
  summarise(c2 = sum(c2)) %>%  ## aggregation of the c2 column
  ungroup() %>%                ## spread doesn't seem to like groups
  spread(c4, c2)               ## spread makes the data wide
# Source: local data frame [6 x 6]
# 
#      c1  c3 A la vista Montoxv_a186d Montoxv_a60d Montoxv_a90d
# 1     E 124         NA      667812.8           NA           NA
# 2     E 201   47619.32            NA           NA           NA
# 3     E 202         NA            NA     116443.6           NA
# 4     E 203   73412.28            NA           NA           NA
# 5     L 124         NA      471730.2           NA     81146.25
# 6     L 201   11376.55            NA           NA           NA
like image 23
A5C1D2H2I1M1N2O1R2T1 Avatar answered Sep 28 '22 01:09

A5C1D2H2I1M1N2O1R2T1