Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

unsplit list, merge factors

I have the following data frame in R:

  c1 c2  
1 10  a  
2 20  a  
3 30  b  
4 40  b

I then split it as follows: z = lapply(split(test$c1, test$c2), function(x) {cut(x,2)}) . z is then:

$a  
[1] (9.99,15] (15,20]  
Levels: (9.99,15] (15,20]

$b  
[1] (30,35] (35,40]
Levels: (30,35] (35,40]  

I would like to then merge the factors back by unsplitting the list unsplit(z, test$c2). This generates a warning:

[1] (9.99,15] (15,20]   <NA>      <NA>     
Levels: (9.99,15] (15,20]
Warning message:
In `[<-.factor`(`*tmp*`, i, value = 1:2) :
  invalid factor level, NAs generated

I would like to take a union of all the factor levels and then unsplit so that this error does not happen:

z$a = factor(z$a, levels=c(levels(z$a), levels(z$b)))
unsplit(z, test$c2)
[1] (9.99,15] (15,20]   (30,35]   (35,40]  
Levels: (9.99,15] (15,20] (30,35] (35,40]    

In my real data frame I have a very big list so I need to iterate over all the list elements (not just two). What is the best way to do this?

like image 527
Alex Avatar asked Feb 24 '23 09:02

Alex


1 Answers

If I understood your question properly, I think you are making this a bit more complicated than needed. Here's one solution using plyr. We will group by the c2 variable:

require(plyr)
ddply(test, "c2", transform, newvar = cut(c1, 2))

which returns:

  c1 c2    newvar
1 10  a (9.99,15]
2 20  a   (15,20]
3 30  b   (30,35]
4 40  b   (35,40]

and has a structure of:

'data.frame':   4 obs. of  3 variables:
 $ c1    : num  10 20 30 40
 $ c2    : Factor w/ 2 levels "a","b": 1 1 2 2
 $ newvar: Factor w/ 4 levels "(9.99,15]","(15,20]",..: 1 2 3 4
like image 159
Chase Avatar answered Mar 11 '23 12:03

Chase