This is a basic problem in data analysis which Stata deals with in one step.
Create a wide data frame with time invariant data (x0) and time varying data for years 2000 and 2005 (x1,x2):
d1 <- data.frame(subject = c("id1", "id2"),
x0 = c("male", "female"),
x1_2000 = 1:2,
x1_2005 = 5:6,
x2_2000 = 1:2,
x2_2005 = 5:6
)
s.t.
subject x0 x1_2000 x1_2005 x2_2000 x2_2005
1 id1 male 1 5 1 5
2 id2 female 2 6 2 6
I want to shape it like a panel so data looks like this:
subject x0 time x1 x2
1 id1 male 2000 1 1
2 id2 female 2000 2 2
3 id1 male 2005 5 5
4 id2 female 2005 6 6
I can do this with reshape
s.t.
d2 <-reshape(d1,
idvar="subject",
varying=list(c("x1_2000","x1_2005"),
c("x2_2000","x2_2005")),
v.names=c("x1","x2"),
times = c(2000,2005),
direction = "long",
sep= "_")
My main concern is that when you have dozens of variables the above command gets very long. In stata
one would simply type:
reshape long x1 x2, i(subject) j(year)
Is there such a simple solution in R?
You can reshape the data using proc transpose or reshape the data in a data step.
To begin with, let us define the 'shape' of a data set. The shape of a data set refers to the way in which a data set is arranged into rows and columns, and reshaping data is the rearrangement of the data without altering the content of the data set.
stata.com. reshape — Convert data from wide to long form and vice versa.
reshape
can guess many of its arguments. In this case it's sufficient to specify the following. No packages are used.
reshape(d1, dir = "long", varying = 3:6, sep = "_")
giving:
subject x0 time x1 x2 id
1.2000 id1 male 2000 1 1 1
2.2000 id2 female 2000 2 2 2
1.2005 id1 male 2005 5 5 1
2.2005 id2 female 2005 6 6 2
here is a brief example using reshape2 package:
library(reshape2)
library(stringr)
# it is always useful to start with melt
d2 <- melt(d1, id=c("subject", "x0"))
# redefine the time and x1, x2, ... separately
d2 <- transform(d2, time = str_replace(variable, "^.*_", ""),
variable = str_replace(variable, "_.*$", ""))
# finally, cast as you want
d3 <- dcast(d2, subject+x0+time~variable)
now you don't need even specifying x1 and x2.
This code works if variables increase:
> d1 <- data.frame(subject = c("id1", "id2"), x0 = c("male", "female"),
+ x1_2000 = 1:2,
+ x1_2005 = 5:6,
+ x2_2000 = 1:2,
+ x2_2005 = 5:6,
+ x3_2000 = 1:2,
+ x3_2005 = 5:6,
+ x4_2000 = 1:2,
+ x4_2005 = 5:6
+ )
>
> d2 <- melt(d1, id=c("subject", "x0"))
> d2 <- transform(d2, time = str_replace(variable, "^.*_", ""),
+ variable = str_replace(variable, "_.*$", ""))
>
> d3 <- dcast(d2, subject+x0+time~variable)
>
> d3
subject x0 time x1 x2 x3 x4
1 id1 male 2000 1 1 1 1
2 id1 male 2005 5 5 5 5
3 id2 female 2000 2 2 2 2
4 id2 female 2005 6 6 6 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With