Reshape wide format, to multi-column long format

Tags:

I want to reshape a wide format dataset that has multiple tests which are measured at 3 time points:

   ID   Test Year   Fall Spring Winter
    1   1   2008    15      16      19
    1   1   2009    12      13      27
    1   2   2008    22      22      24
    1   2   2009    10      14      20
    2   1   2008    12      13      25
    2   1   2009    16      14      21
    2   2   2008    13      11      29
    2   2   2009    23      20      26
    3   1   2008    11      12      22
    3   1   2009    13      11      27
    3   2   2008    17      12      23
    3   2   2009    14      9       31

into a data set that separates the tests by column but converts the measurement time into long format, for each of the new columns like this:

    ID  Year    Time        Test1 Test2
    1   2008    Fall        15      22
    1   2008    Spring      16      22
    1   2008    Winter      19      24
    1   2009    Fall        12      10
    1   2009    Spring      13      14
    1   2009    Winter      27      20
    2   2008    Fall        12      13
    2   2008    Spring      13      11
    2   2008    Winter      25      29
    2   2009    Fall        16      23
    2   2009    Spring      14      20
    2   2009    Winter      21      26
    3   2008    Fall        11      17
    3   2008    Spring      12      12
    3   2008    Winter      22      23
    3   2009    Fall        13      14
    3   2009    Spring      11      9
    3   2009    Winter      27      31

I have unsuccessfully tried to use reshape and melt. Existing posts address transforming to single column outcome.

390

asked Mar 27 '13 20:03

Sam

2 Answers

Using reshape2:

# Thanks to Ista for helping with direct naming using "variable.name"
df.m <- melt(df, id.var = c("ID", "Test", "Year"), variable.name = "Time")
df.m <- transform(df.m, Test = paste0("Test", Test))
dcast(df.m, ID + Year + Time ~ Test, value.var = "value")

Update: Using data.table melt/cast from versions >= 1.9.0:

data.table from versions 1.9.0 imports reshape2 package and implements fast melt and dcast methods in C for data.tables. A comparison of speed on bigger data is shown below.

For more info regarding NEWS, go here.

require(data.table) ## ver. >=1.9.0
require(reshape2)

dt <- as.data.table(df, key=c("ID", "Test", "Year"))
dt.m <- melt(dt, id.var = c("ID", "Test", "Year"), variable.name = "Time")
dt.m[, Test := paste0("Test", Test)]
dcast.data.table(dt.m, ID + Year + Time ~ Test, value.var = "value")

At the moment, you'll have to write dcast.data.table explicitly as it's not a S3 generic in reshape2 yet.

Benchmarking on bigger data:

# generate data:
set.seed(45L)
DT <- data.table(ID = sample(1e2, 1e7, TRUE), 
        Test = sample(1e3, 1e7, TRUE), 
        Year = sample(2008:2014, 1e7,TRUE), 
        Fall = sample(50, 1e7, TRUE), 
        Spring = sample(50, 1e7,TRUE), 
        Winter = sample(50, 1e7, TRUE))
DF <- as.data.frame(DT)

reshape2 timings:

reshape2_melt <- function(df) {
    df.m <- melt(df, id.var = c("ID", "Test", "Year"), variable.name = "Time")
}
# min. of three consecutive runs
system.time(df.m <- reshape2_melt(DF))
#   user  system elapsed 
# 43.319   4.909  48.932 

df.m <- transform(df.m, Test = paste0("Test", Test))

reshape2_cast <- function(df) {
    dcast(df.m, ID + Year + Time ~ Test, value.var = "value")
}
# min. of three consecutive runs
system.time(reshape2_cast(df.m))
#   user  system elapsed 
# 57.728   9.712  69.573

data.table timings:

DT_melt <- function(dt) {
    dt.m <- melt(dt, id.var = c("ID", "Test", "Year"), variable.name = "Time")
}
# min. of three consecutive runs
system.time(dt.m <- reshape2_melt(DT))
#   user  system elapsed 
#  0.276   0.001   0.279 

dt.m[, Test := paste0("Test", Test)]

DT_cast <- function(dt) {
    dcast.data.table(dt.m, ID + Year + Time ~ Test, value.var = "value")
}
# min. of three consecutive runs
system.time(DT_cast(dt.m))
#   user  system elapsed 
# 12.732   0.825  14.006

melt.data.table is ~175x faster than reshape2:::melt and dcast.data.table is ~5x than reshape2:::dcast.

100

answered Sep 20 '22 12:09

Arun

Sticking with base R, this is another good candidate for the "stack + reshape" routine. Assuming our dataset is called "mydf":

mydf.temp <- data.frame(mydf[1:3], stack(mydf[4:6]))
mydf2 <- reshape(mydf.temp, direction = "wide", 
                 idvar=c("ID", "Year", "ind"), 
                 timevar="Test")
names(mydf2) <- c("ID", "Year", "Time", "Test1", "Test2")
mydf2
#    ID Year   Time Test1 Test2
# 1   1 2008   Fall    15    22
# 2   1 2009   Fall    12    10
# 5   2 2008   Fall    12    13
# 6   2 2009   Fall    16    23
# 9   3 2008   Fall    11    17
# 10  3 2009   Fall    13    14
# 13  1 2008 Spring    16    22
# 14  1 2009 Spring    13    14
# 17  2 2008 Spring    13    11
# 18  2 2009 Spring    14    20
# 21  3 2008 Spring    12    12
# 22  3 2009 Spring    11     9
# 25  1 2008 Winter    19    24
# 26  1 2009 Winter    27    20
# 29  2 2008 Winter    25    29
# 30  2 2009 Winter    21    26
# 33  3 2008 Winter    22    23
# 34  3 2009 Winter    27    31

answered Sep 18 '22 12:09

A5C1D2H2I1M1N2O1R2T1

Related questions
                            
                                R legend pch mix of character and numeric
                            
                                How to remove default axis from the plot produced by boxplot()?
                            
                                Keeping zero count combinations when aggregating with data.table
                            
                                Fastest way to count occurrences of each unique element
                            
                                Pretty print sql code from separate file with knitr
                            
                                Create bookmarks into a PDF file via command line
                            
                                "read_excel" in a Shiny app
                            
                                data.table in R - multiple filters using multiple keys - binary search
                            
                                How to print text and variables in a single line in r
                            
                                Match/group duplicate rows (indices)
                            
                                RStudio gives "Incorrect function" when setting git as Version control
                            
                                Embed Rmarkdown with Rmarkdown, without knitr evaluation
                            
                                dplyr count number of one specific value of variable
                            
                                dplyr::n() returns "Error: This function should not be called directly"
                            
                                Efficient calculation of var-covar matrix in R
                            
                                How to change the font of the main title in plot()
                            
                                Plotting google map with ggplot in R
                            
                                R: numeric vector becoming non-numeric after cbind of dates
                            
                                plots generated by 'plot' and 'ggplot' side-by-side
                            
                                strptime, as.POSIXct and as.Date return unexpected NA

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Reshape wide format, to multi-column long format

Tags:

r

reshape

melt

reshape2

Sam

People also ask