How to do this more elegantly with plyr
, reshape2
, aggregate
function and/or data.table
?
library(plyr)
set.seed(1)
x <- data.frame(Ind = paste0("Ind", 1:10), Treatment = c(rep("Treat",10),rep("Cont",10)),
value = rnorm(20,60,8))
tr <- subset(x, Treatment == "Treat")
tr <- rename(tr, c("value" = "Treat"))
ct <- subset(x, Treatment == "Cont")
ct <- rename(ct, c("value" = "Cont"))
merge(ct[-2], tr[-2], by = "Ind", all = T, sort = F)
# Do not run, data.frame:
Ind Cont Treat
1 Ind1 72.09425 54.98837
2 Ind2 63.11875 61.46915
3 Ind3 55.03008 53.31497
4 Ind4 42.28240 72.76225
5 Ind5 68.99945 62.63606
6 Ind6 59.64053 53.43625
7 Ind7 59.87048 63.89943
8 Ind8 67.55069 65.90660
9 Ind9 66.56977 64.60625
10 Ind10 64.75121 57.55689
To summarize, if you need to reshape a Pandas dataframe from long to wide, use pd. pivot() . If you need to reshape a Pandas dataframe from wide to long, use pd. melt() .
The reshape command can work on more than one variable at a time. In the example above, we just reshaped the age variable. In the example below, we reshape the variables age, wt and sex like this.
Data Reshaping in R is about changing the way data is organized into rows and columns. Most of the time data processing in R is done by taking the input data as a data frame.
Using 'tidyr' It uses the gather function to convert data from wide to long format and uses the spread function to convert it from long to wide format.
To add to your options...
Here's our starting data:
set.seed(1) # Nice for reproducible examples
x <- data.frame(Ind = paste0("Ind", 1:10),
Treatment = c(rep("Treat",10),rep("Cont",10)),
value = rnorm(20,60,8))
xtabs
Note that the output is a matrix
, not a data.frame
.
xtabs(value ~ Ind + Treatment, x)
# Treatment
# Ind Cont Treat
# Ind1 72.09425 54.98837
# Ind10 64.75121 57.55689
# Ind2 63.11875 61.46915
# Ind3 55.03008 53.31497
# Ind4 42.28240 72.76225
# Ind5 68.99945 62.63606
# Ind6 59.64053 53.43625
# Ind7 59.87048 63.89943
# Ind8 67.55069 65.90660
# Ind9 66.56977 64.60625
reshape
reshape(x, direction = "wide", idvar="Ind", timevar="Treatment")
# Ind value.Treat value.Cont
# 1 Ind1 54.98837 72.09425
# 2 Ind2 61.46915 63.11875
# 3 Ind3 53.31497 55.03008
# 4 Ind4 72.76225 42.28240
# 5 Ind5 62.63606 68.99945
# 6 Ind6 53.43625 59.64053
# 7 Ind7 63.89943 59.87048
# 8 Ind8 65.90660 67.55069
# 9 Ind9 64.60625 66.56977
# 10 Ind10 57.55689 64.75121
If you wanted to change the names at the same time with the reshape
option:
setNames(reshape(x, direction = "wide", idvar="Ind", timevar="Treatment"),
c("Ind", "Treat", "Cont"))
split
+ merge
Again, setNames
could be used here, or you can rename the columns afterwards.
temp <- split(x[-2], x$Treatment)
merge(temp[[1]], temp[[2]], by = "Ind", suffixes = names(temp))
# Ind valueCont valueTreat
# 1 Ind1 72.09425 54.98837
# 2 Ind10 64.75121 57.55689
# 3 Ind2 63.11875 61.46915
# 4 Ind3 55.03008 53.31497
# 5 Ind4 42.28240 72.76225
# 6 Ind5 68.99945 62.63606
# 7 Ind6 59.64053 53.43625
# 8 Ind7 59.87048 63.89943
# 9 Ind8 67.55069 65.90660
# 10 Ind9 66.56977 64.60625
ddply
from plry
(I'm not a regular "plyr" user, so not at all sure if this is the best approach).
library(plyr)
ddply(x, .(Ind), summarize,
Treat = value[Treatment == "Treat"],
Cont = value[Treatment == "Cont"])
# Ind Treat Cont
# 1 Ind1 54.98837 72.09425
# 2 Ind10 57.55689 64.75121
# 3 Ind2 61.46915 63.11875
# 4 Ind3 53.31497 55.03008
# 5 Ind4 72.76225 42.28240
# 6 Ind5 62.63606 68.99945
# 7 Ind6 53.43625 59.64053
# 8 Ind7 63.89943 59.87048
# 9 Ind8 65.90660 67.55069
# 10 Ind9 64.60625 66.56977
unstack
(as if the options weren't enough!)unique(data.frame(x[1], unstack(x, value ~ Treatment)))
# Ind Cont Treat
# 1 Ind1 72.09425 54.98837
# 2 Ind2 63.11875 61.46915
# 3 Ind3 55.03008 53.31497
# 4 Ind4 42.28240 72.76225
# 5 Ind5 68.99945 62.63606
# 6 Ind6 59.64053 53.43625
# 7 Ind7 59.87048 63.89943
# 8 Ind8 67.55069 65.90660
# 9 Ind9 66.56977 64.60625
# 10 Ind10 64.75121 57.55689
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With