Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to pivot/unpivot (cast/melt) data frame? [duplicate]

How can I 'unpivot' a table? What is the proper technical term for this?

UPDATE: The term is called melt

I have a data frame for countries and data for each year

Country     2001    2002    2003
Nigeria     1       2       3
UK          2       NA       1

And I want to have something like

Country    Year    Value
Nigeria    2001    1
Nigeria    2002    2
Nigeria    2003    3
UK         2001    2
UK         2002    NA
UK         2003    1
like image 311
pedrosaurio Avatar asked Nov 02 '11 12:11

pedrosaurio


People also ask

How do you pivot Unpivot in Excel?

Select the columns you do want to unpivot. To select more than one column contiguously or discontiguously, press Shift+Click or CTRL+Click on each subsequent column. Select Transform > Unpivot Only Selected Columns.

What does Pivot_longer do in R?

pivot_longer() makes datasets longer by increasing the number of rows and decreasing the number of columns.

What is Pivot vs Unpivot?

PIVOT carries out an aggregation and merges possible multiple rows into a single row in the output. UNPIVOT doesn't reproduce the original table-valued expression result because rows have been merged.


3 Answers

I still can't believe I beat Andrie with an answer. :)

> library(reshape) > my.df <- read.table(text = "Country     2001    2002    2003    + Nigeria     1       2       3    + UK          2       NA       1", header = TRUE) > my.result <- melt(my.df, id = c("Country")) > my.result[order(my.result$Country),]      Country variable value    1 Nigeria    X2001     1    3 Nigeria    X2002     2    5 Nigeria    X2003     3    2      UK    X2001     2    4      UK    X2002    NA    6      UK    X2003     1 
like image 164
Roman Luštrik Avatar answered Sep 24 '22 18:09

Roman Luštrik


The base R reshape approach for this problem is pretty ugly, particularly since the names aren't in a form that reshape likes. It would be something like the following, where the first setNames line modifies the column names into something that reshape can make use of.

reshape(   setNames(mydf, c("Country", paste0("val.", c(2001, 2002, 2003)))),    direction = "long", idvar = "Country", varying = 2:ncol(mydf),    sep = ".", new.row.names = seq_len(prod(dim(mydf[-1])))) 

A better alternative in base R is to use stack, like this:

cbind(mydf[1], stack(mydf[-1])) #   Country values  ind # 1 Nigeria      1 2001 # 2      UK      2 2001 # 3 Nigeria      2 2002 # 4      UK     NA 2002 # 5 Nigeria      3 2003 # 6      UK      1 2003 

There are also new tools for reshaping data now available, like the "tidyr" package, which gives us gather. Of course, the tidyr:::gather_.data.frame method just calls reshape2::melt, so this part of my answer doesn't necessarily add much except introduce the newer syntax that you might be encountering in the Hadleyverse.

library(tidyr) gather(mydf, year, value, `2001`:`2003`) ## Note the backticks #   Country year value # 1 Nigeria 2001     1 # 2      UK 2001     2 # 3 Nigeria 2002     2 # 4      UK 2002    NA # 5 Nigeria 2003     3 # 6      UK 2003     1 

All three options here would need reordering of rows if you want the row order you showed in your question.


A fourth option would be to use merged.stack from my "splitstackshape" package. Like base R's reshape, you'll need to modify the column names to something that includes a "variable" and "time" indicator.

library(splitstackshape) merged.stack(   setNames(mydf, c("Country", paste0("V.", 2001:2003))),   var.stubs = "V", sep = ".") #    Country .time_1  V # 1: Nigeria    2001  1 # 2: Nigeria    2002  2 # 3: Nigeria    2003  3 # 4:      UK    2001  2 # 5:      UK    2002 NA # 6:      UK    2003  1 

Sample data

 mydf <- structure(list(Country = c("Nigeria", "UK"), `2001` = 1:2, `2002` = c(2L,       NA), `2003` = c(3L, 1L)), .Names = c("Country", "2001", "2002",                     "2003"), row.names = 1:2, class = "data.frame")    
like image 35
A5C1D2H2I1M1N2O1R2T1 Avatar answered Sep 21 '22 18:09

A5C1D2H2I1M1N2O1R2T1


You can use the melt command from the reshape package. See here: http://www.statmethods.net/management/reshape.html

Probably something like melt(myframe, id=c('Country'))

like image 35
nicolaskruchten Avatar answered Sep 25 '22 18:09

nicolaskruchten