Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I efficiently construct a very long factor with few levels?

Tags:

performance

r

In R, I want to create a factor with only a few levels, but with a length of almost 100 million. The "normal" way for me to create a factor is to call factor on a character vector, but I expect this method to be very inefficient. What is the proper way to construct a long factor without fully expanding the corresponding character vector.

Here is an example of the wrong way to do it: creating and then factoring a character vector:

long.char.vector = sample(c("left", "middle", "right"), replace=TRUE, 50000000)
long.factor = factor(long.char.vector)

How can I construct long.factor without first constructing long.char.vector? Yes, I know those two lines of code can be combined, but the resulting line of code still creates the gigantic char vector anyway.

like image 782
Ryan C. Thompson Avatar asked Dec 21 '22 15:12

Ryan C. Thompson


1 Answers

It's not going to be much more efficient, but you can sample a factor vector:

big.factor <- sample(factor(c("left", "middle", "right")), replace=TRUE, 5e7)
like image 159
Joshua Ulrich Avatar answered Jan 31 '23 00:01

Joshua Ulrich