I'm new to R / having the option to easily re-organize data, and have hunted around for a solution but can't find exactly what I'd like to do. Reshape2's melt/cast doesn't quite seem to work and I haven't mastered plyr well enough to factor it in here.
Basically I have a data.frame with a structure outlined below, with a category column in which each element is a variable-length list of categories (more compact because the # columns is much larger, and I actually have multiple category_lists that I'd like to keep separate):
>mydf
ID category_list xval yval
1 ID1 cat1, cat2, cat3 xnum1 ynum1
2 ID2 cat2, cat3 xnum2 ynum2
3 ID3 cat1 xnum3 ynum3
I want to do manipulations with the categories as factors (and the values associated, i.e. columns 3/4), so I think I need something like this in the end, where IDs and x/y/other column values are duplicated according to the length of the category list:
ID category xval yval
1 ID1 cat1 xnum1 ynum1
2 ID1 cat2 xnum1 ynum1
3 ID1 cat3 xnum1 ynum1
4 ID2 cat2 xnum2 ynum2
5 ID2 cat3 xnum2 ynum2
6 ID3 cat3 xnum2 ynum2
If there's another solution to factor/facet on the category_list, that would be a simpler solution but I haven't come across methods that support this, e.g. the following throws an error
>ggplot(mydf, aes(x=x, y=y)) + geom_point() + facet_grid(~cat_list)
Error in layout_base(data, cols, drop = drop) : At least one layer must contain all variables used for facetting
Thanks!
The answer will depend on the format of category_list
. If in fact it is a list
for each row
Something like
mydf <- data.frame(ID = paste0('ID',1:3),
category_list = I(list(c('cat1','cat2','cat3'), c('cat2','cat3'), c('cat1'))),
xval = 1:3, yval = 1:3)
or
library(data.table)
mydf <- as.data.frame(data.table(ID = paste0('ID',1:3),
category_list = list(c('cat1','cat2','cat3'), c('cat2','cat3'), c('cat1')),
xval = 1:3, yval = 1:3) )
Then you can use plyr
and merge
to create your long form data
newdf <- merge(mydf, ddply(mydf, .(ID), summarize, cat_list = unlist(category_list)), by = 'ID')
ID category_list xval yval cat_list
1 ID1 cat1, cat2, cat3 1 1 cat1
2 ID1 cat1, cat2, cat3 1 1 cat2
3 ID1 cat1, cat2, cat3 1 1 cat3
4 ID2 cat2, cat3 2 2 cat2
5 ID2 cat2, cat3 2 2 cat3
6 ID3 cat1 3 3 cat1
or a non-plyr approach that doesn't require merge
do.call(rbind,lapply(split(mydf, mydf$ID), transform, cat_list = unlist(category_list)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With