As part of piloting a survey, I presented each Turker with sets of choices amongst four alternatives. The data looks like this:
> so
WorkerId pio_1_1 pio_1_2 pio_1_3 pio_1_4 pio_2_1 pio_2_2 pio_2_3 pio_2_4
1 1 Yes No No No No No Yes No
2 2 No Yes No No Yes No Yes No
3 3 Yes Yes No No Yes No Yes No
I'd like it to look like this:
WorkerId set pio1 pio2 pio3 pio4
1 1 Yes No No No
1 2 No No Yes No
...
I can kludge through this by a number of means, none of which seem very elegant:
But it seems to me that all of these ignore the idea that data in what you might call "double wide" format has its own structure. I'd love to use the reshape2 package for this, but despite the data having been produced with cast() I don't see any options that would help me truly melt this data.frame back.
Suggestions welcome.
so <- structure(list(WorkerId = 1:3, pio_1_1 = structure(c(2L, 1L,
2L), .Label = c("No", "Yes"), class = "factor"), pio_1_2 = structure(c(1L,
2L, 2L), .Label = c("No", "Yes"), class = "factor"), pio_1_3 = structure(c(1L,
1L, 1L), .Label = c("No", "Yes"), class = "factor"), pio_1_4 = structure(c(1L,
1L, 1L), .Label = "No", class = "factor"), pio_2_1 = structure(c(1L,
2L, 2L), .Label = c("No", "Yes"), class = "factor"), pio_2_2 = structure(c(1L,
1L, 1L), .Label = c("No", "Yes"), class = "factor"), pio_2_3 = structure(c(2L,
2L, 2L), .Label = c("No", "Yes"), class = "factor"), pio_2_4 = structure(c(1L,
1L, 1L), .Label = "No", class = "factor")), .Names = c("WorkerId",
"pio_1_1", "pio_1_2", "pio_1_3", "pio_1_4", "pio_2_1", "pio_2_2",
"pio_2_3", "pio_2_4"), row.names = c(NA, 3L), class = "data.frame")
I would suggest doing some gsub on your names to get them into a form that R likes better, in other words, with the time variable being the last item, not the middle item, and with a "." being the default separator.
Try this:
names(so) = gsub("([a-z])_([0-9])_([0-9])", "\\1_\\3\\.\\2", names(so))
so.l = reshape(so, direction="long", varying=2:9, timevar="set", idvar=1)
Then, if you want to sort by WorkerId:
so.l = so.l[order(so.l$WorkerId), ]
If we call your original data set dat this will do it using base:
dat2 <- reshape(dat,
varying=list(pio_1= c(2, 6), pio_2= c(3,7), pio_3= c(4,8), pio_4= c(5,9) ),
v.names=c(paste0("pio_",1:4)),
idvar = "WorkerId",
direction="long",
timevar="set")
row.names(dat2) <- NULL
dat2[order(dat2$WorkerId, dat2$set), ]
Which yields:
WorkerId set pio_1 pio_2 pio_3 pio_4
1 1 1 Yes No No No
2 1 2 No No Yes No
3 2 1 No Yes No No
4 2 2 Yes No Yes No
5 3 1 Yes Yes No No
6 3 2 Yes No Yes No
EDIT: (Alright I couldn't resist taking a crack at making it easier to automate)
y <- do.call('rbind', strsplit(names(dat)[-1], "_"))[, c(1, 3, 2)]
names(dat) <- c(names(dat)[1], paste0(y[, 1], "_", y[, 2], ".", y[, 3]))
dat2 <- reshape(dat,
varying=2:9,
idvar = "WorkerId",
direction="long",
timevar="set")
row.names(dat2) <- NULL
dat2[order(dat2$WorkerId, dat2$set), ]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With