Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alternatives to stats::reshape

Tags:

r

reshape

The melt/cast functions in the reshape package are great, but I'm not sure if there is a simple way to apply them when measured variables are of different types. For example, here is a snippet from data where each MD provides the gender and weight of three patients:

ID PT1 WT1 PT2 WT2 PT3 WT3
1  "M" 170 "M" 175 "F" 145
...

where the objective is to reshape so each row is a patient:

ID PTNUM GENDER WEIGHT
1    1     "M"    170
1    2     "M"    175
1    3     "F"    145
...

Using the reshape function in the stats package is one option of which I'm aware, but I'm posting here in the hopes that R users more experienced than me will post other, hopefully better methods. Many thanks!

--

@Vincent Zoonekynd :

I liked your example a lot, so I generalized it to multiple variables.

# Sample data
n <- 5
d <- data.frame(
  id = 1:n,
  p1 = sample(c("M","F"),n,replace=TRUE),
  q1 = sample(c("Alpha","Beta"),n,replace=TRUE),
  w1 = round(runif(n,100,200)),
  y1 = round(runif(n,100,200)),
  p2 = sample(c("M","F"),n,replace=TRUE),
  q2 = sample(c("Alpha","Beta"),n,replace=TRUE),
  w2 = round(runif(n,100,200)),
  y2 = round(runif(n,100,200)),
  p3 = sample(c("M","F"),n,replace=TRUE),
  q3 = sample(c("Alpha","Beta"),n,replace=TRUE),
  w3 = round(runif(n,100,200)),
  y3 = round(runif(n,100,200))
  )
# Reshape the data.frame, one variable at a time
library(reshape)
d1 <- melt(d, id.vars="id", measure.vars=c("p1","p2","p3","q1","q2","q3"))
d2 <- melt(d, id.vars="id", measure.vars=c("w1","w2","w3","y1","y2","y3"))
d1 = cbind(d1,colsplit(d1$variable,names=c("var","ptnum")))
d2 = cbind(d2,colsplit(d2$variable,names=c("var","ptnum")))
d1$variable = NULL
d2$variable = NULL
d1c = cast(d1,...~var)
d2c = cast(d2,...~var)
# Join the two data.frames
d3 = merge(d1c, d2c, by=c("id","ptnum"), all=TRUE)

--

Final thoughts: my motivation for this question was to learn about alternatives to the reshape package other than the stats::reshape function. For the moment, I've reached the following conclusions:

  • Stick to stats::reshape when you can. As long as you remember to use a list rather than a simple vector for the "varying" argument, you'll stay out of trouble. For smaller data sets--a few thousand patient cases with less than 200 variables in total is what I was dealing with this time--the lower speed of this function is worth the simplicity of the code.

  • To use the cast/melt approach in Hadley Wickham's reshape (or reshape2) package, you have to split your variables into two sets, one consisting of numeric variables and another of character variables. When your data set is large enough that you find stats::reshape unbearable, I imagine the extra step of dividing your variables into two sets won't seem so bad.

like image 639
Krishna Tateneni Avatar asked Feb 18 '12 14:02

Krishna Tateneni


2 Answers

You can process each variable separately, and join the resulting two data.frames.

# Sample data
n <- 5
d <- data.frame(
  id = 1:n,
  pt1 = sample(c("M","F"),n,replace=TRUE),
  wt1 = round(runif(n,100,200)),
  pt2 = sample(c("M","F"),n,replace=TRUE),
  wt2 = round(runif(n,100,200)),
  pt3 = sample(c("M","F"),n,replace=TRUE),
  wt3 = round(runif(n,100,200))
)
# Reshape the data.frame, one variable at a time
library(reshape2)
d1 <- melt(d, 
  id.vars="id", measure.vars=c("pt1","pt2","pt3"), 
  variable.name="patient", value.name="gender"
)
d2 <- melt(d, 
  id.vars="id", measure.vars=c("wt1","wt2","wt3"), 
  variable.name="patient", value.name="weight"
)
d1$patient <- as.numeric(gsub("pt", "", d1$patient))
d2$patient <- as.numeric(gsub("wt", "", d1$patient))
# Join the two data.frames
merge(d1, d2, by=c("id","patient"), all=TRUE)
like image 71
Vincent Zoonekynd Avatar answered Oct 19 '22 09:10

Vincent Zoonekynd


I think the reshape function in the stats package is simplest. Here is a simple example, does this do what you want?

> tmp
  id val val2 cat
1  1   1   14   a
2  1   2   13   b
3  2   3   12   b
4  2   4   11   a
> tmp2 <- tmp
> tmp2$t <- ave(tmp2$val, tmp2$id, FUN=seq_along)
> tmp2
  id val val2 cat t
1  1   1   14   a 1
2  1   2   13   b 2
3  2   3   12   b 1
4  2   4   11   a 2
> reshape(tmp2, idvar='id', timevar='t', direction='wide')
  id val.1 val2.1 cat.1 val.2 val2.2 cat.2
1  1     1     14     a     2     13     b
3  2     3     12     b     4     11     a

Hopefully your patients sex is not changing each appointment, but there could be other categorical variables that change between visits

like image 41
Greg Snow Avatar answered Oct 19 '22 10:10

Greg Snow