Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reshape in the middle

Tags:

r

reshape

As part of piloting a survey, I presented each Turker with sets of choices amongst four alternatives. The data looks like this:

> so
  WorkerId pio_1_1 pio_1_2 pio_1_3 pio_1_4 pio_2_1 pio_2_2 pio_2_3 pio_2_4
1        1     Yes      No      No      No      No      No     Yes      No
2        2      No     Yes      No      No     Yes      No     Yes      No
3        3     Yes     Yes      No      No     Yes      No     Yes      No

I'd like it to look like this:

WorkerId set pio1 pio2 pio3 pio4
       1   1  Yes   No   No   No
       1   2   No   No  Yes   No
...

I can kludge through this by a number of means, none of which seem very elegant:

  • Swapping the order of the numbers with regexes and backreferencing and then using reshape()
  • Writing my own little function to parse out the first digit between the underscores and then reshape it long
  • Splitting and then stacking the columns (relies on the ordering being right)

But it seems to me that all of these ignore the idea that data in what you might call "double wide" format has its own structure. I'd love to use the reshape2 package for this, but despite the data having been produced with cast() I don't see any options that would help me truly melt this data.frame back.

Suggestions welcome.

so <- structure(list(WorkerId = 1:3, pio_1_1 = structure(c(2L, 1L, 
2L), .Label = c("No", "Yes"), class = "factor"), pio_1_2 = structure(c(1L, 
2L, 2L), .Label = c("No", "Yes"), class = "factor"), pio_1_3 = structure(c(1L, 
1L, 1L), .Label = c("No", "Yes"), class = "factor"), pio_1_4 = structure(c(1L, 
1L, 1L), .Label = "No", class = "factor"), pio_2_1 = structure(c(1L, 
2L, 2L), .Label = c("No", "Yes"), class = "factor"), pio_2_2 = structure(c(1L, 
1L, 1L), .Label = c("No", "Yes"), class = "factor"), pio_2_3 = structure(c(2L, 
2L, 2L), .Label = c("No", "Yes"), class = "factor"), pio_2_4 = structure(c(1L, 
1L, 1L), .Label = "No", class = "factor")), .Names = c("WorkerId", 
"pio_1_1", "pio_1_2", "pio_1_3", "pio_1_4", "pio_2_1", "pio_2_2", 
"pio_2_3", "pio_2_4"), row.names = c(NA, 3L), class = "data.frame")
like image 554
Ari B. Friedman Avatar asked Apr 15 '12 11:04

Ari B. Friedman


2 Answers

I would suggest doing some gsub on your names to get them into a form that R likes better, in other words, with the time variable being the last item, not the middle item, and with a "." being the default separator.

Try this:

names(so) = gsub("([a-z])_([0-9])_([0-9])", "\\1_\\3\\.\\2", names(so))
so.l = reshape(so, direction="long", varying=2:9, timevar="set", idvar=1)

Then, if you want to sort by WorkerId:

so.l = so.l[order(so.l$WorkerId), ]
like image 118
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 03 '22 15:10

A5C1D2H2I1M1N2O1R2T1


If we call your original data set dat this will do it using base:

dat2 <- reshape(dat, 
    varying=list(pio_1= c(2, 6), pio_2= c(3,7), pio_3= c(4,8), pio_4= c(5,9) ),
    v.names=c(paste0("pio_",1:4)), 
    idvar = "WorkerId",
    direction="long", 
    timevar="set") 
row.names(dat2) <- NULL
dat2[order(dat2$WorkerId, dat2$set), ]

Which yields:

  WorkerId set pio_1 pio_2 pio_3 pio_4
1        1   1   Yes    No    No    No
2        1   2    No    No   Yes    No
3        2   1    No   Yes    No    No
4        2   2   Yes    No   Yes    No
5        3   1   Yes   Yes    No    No
6        3   2   Yes    No   Yes    No

EDIT: (Alright I couldn't resist taking a crack at making it easier to automate)

y <- do.call('rbind', strsplit(names(dat)[-1], "_"))[, c(1, 3, 2)]
names(dat) <- c(names(dat)[1], paste0(y[, 1], "_", y[, 2], ".", y[, 3]))

dat2 <- reshape(dat, 
    varying=2:9, 
    idvar = "WorkerId",
    direction="long", 
    timevar="set")
row.names(dat2) <- NULL
dat2[order(dat2$WorkerId, dat2$set), ]
like image 44
Tyler Rinker Avatar answered Oct 03 '22 15:10

Tyler Rinker