Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reshape data from long to a short format by a variable, and rename columns

How to do this more elegantly with plyr, reshape2, aggregate function and/or data.table?

library(plyr)

set.seed(1) 
x <- data.frame(Ind = paste0("Ind", 1:10), Treatment = c(rep("Treat",10),rep("Cont",10)),
value = rnorm(20,60,8))

tr <- subset(x, Treatment == "Treat")
tr <- rename(tr, c("value" = "Treat"))

ct <- subset(x, Treatment == "Cont")
ct <- rename(ct, c("value" = "Cont"))

merge(ct[-2], tr[-2], by = "Ind", all = T, sort = F)

# Do not run, data.frame:
     Ind     Cont    Treat
1   Ind1 72.09425 54.98837
2   Ind2 63.11875 61.46915
3   Ind3 55.03008 53.31497
4   Ind4 42.28240 72.76225
5   Ind5 68.99945 62.63606
6   Ind6 59.64053 53.43625
7   Ind7 59.87048 63.89943
8   Ind8 67.55069 65.90660
9   Ind9 66.56977 64.60625
10 Ind10 64.75121 57.55689
like image 322
Mikko Avatar asked Apr 16 '13 09:04

Mikko


People also ask

How do you reshape a data frame from long to wide?

To summarize, if you need to reshape a Pandas dataframe from long to wide, use pd. pivot() . If you need to reshape a Pandas dataframe from wide to long, use pd. melt() .

Can you reshape multiple variables in Stata?

The reshape command can work on more than one variable at a time. In the example above, we just reshaped the age variable. In the example below, we reshape the variables age, wt and sex like this.

What is reshape data?

Data Reshaping in R is about changing the way data is organized into rows and columns. Most of the time data processing in R is done by taking the input data as a data frame.

Which of the following is used to convert wide data to long data?

Using 'tidyr' It uses the gather function to convert data from wide to long format and uses the spread function to convert it from long to wide format.


1 Answers

To add to your options...

Here's our starting data:

set.seed(1) # Nice for reproducible examples
x <- data.frame(Ind = paste0("Ind", 1:10), 
                Treatment = c(rep("Treat",10),rep("Cont",10)),
                value = rnorm(20,60,8))

xtabs

Note that the output is a matrix, not a data.frame.

xtabs(value ~ Ind + Treatment, x)
#        Treatment
# Ind         Cont    Treat
#   Ind1  72.09425 54.98837
#   Ind10 64.75121 57.55689
#   Ind2  63.11875 61.46915
#   Ind3  55.03008 53.31497
#   Ind4  42.28240 72.76225
#   Ind5  68.99945 62.63606
#   Ind6  59.64053 53.43625
#   Ind7  59.87048 63.89943
#   Ind8  67.55069 65.90660
#   Ind9  66.56977 64.60625

reshape

reshape(x, direction = "wide", idvar="Ind", timevar="Treatment")
#      Ind value.Treat value.Cont
# 1   Ind1    54.98837   72.09425
# 2   Ind2    61.46915   63.11875
# 3   Ind3    53.31497   55.03008
# 4   Ind4    72.76225   42.28240
# 5   Ind5    62.63606   68.99945
# 6   Ind6    53.43625   59.64053
# 7   Ind7    63.89943   59.87048
# 8   Ind8    65.90660   67.55069
# 9   Ind9    64.60625   66.56977
# 10 Ind10    57.55689   64.75121

If you wanted to change the names at the same time with the reshape option:

setNames(reshape(x, direction = "wide", idvar="Ind", timevar="Treatment"), 
         c("Ind", "Treat", "Cont"))

split + merge

Again, setNames could be used here, or you can rename the columns afterwards.

temp <- split(x[-2], x$Treatment)
merge(temp[[1]], temp[[2]], by = "Ind", suffixes = names(temp))
#      Ind valueCont valueTreat
# 1   Ind1  72.09425   54.98837
# 2  Ind10  64.75121   57.55689
# 3   Ind2  63.11875   61.46915
# 4   Ind3  55.03008   53.31497
# 5   Ind4  42.28240   72.76225
# 6   Ind5  68.99945   62.63606
# 7   Ind6  59.64053   53.43625
# 8   Ind7  59.87048   63.89943
# 9   Ind8  67.55069   65.90660
# 10  Ind9  66.56977   64.60625

ddply from plry

(I'm not a regular "plyr" user, so not at all sure if this is the best approach).

library(plyr)
ddply(x, .(Ind), summarize, 
      Treat = value[Treatment == "Treat"], 
      Cont = value[Treatment == "Cont"])
#      Ind    Treat     Cont
# 1   Ind1 54.98837 72.09425
# 2  Ind10 57.55689 64.75121
# 3   Ind2 61.46915 63.11875
# 4   Ind3 53.31497 55.03008
# 5   Ind4 72.76225 42.28240
# 6   Ind5 62.63606 68.99945
# 7   Ind6 53.43625 59.64053
# 8   Ind7 63.89943 59.87048
# 9   Ind8 65.90660 67.55069
# 10  Ind9 64.60625 66.56977

unstack (as if the options weren't enough!)

unique(data.frame(x[1], unstack(x, value ~ Treatment)))
#      Ind     Cont    Treat
# 1   Ind1 72.09425 54.98837
# 2   Ind2 63.11875 61.46915
# 3   Ind3 55.03008 53.31497
# 4   Ind4 42.28240 72.76225
# 5   Ind5 68.99945 62.63606
# 6   Ind6 59.64053 53.43625
# 7   Ind7 59.87048 63.89943
# 8   Ind8 67.55069 65.90660
# 9   Ind9 66.56977 64.60625
# 10 Ind10 64.75121 57.55689
like image 152
A5C1D2H2I1M1N2O1R2T1 Avatar answered Sep 28 '22 02:09

A5C1D2H2I1M1N2O1R2T1