In R, I want to create a factor with only a few levels, but with a length of almost 100 million. The "normal" way for me to create a factor is to call factor
on a character vector, but I expect this method to be very inefficient. What is the proper way to construct a long factor without fully expanding the corresponding character vector.
Here is an example of the wrong way to do it: creating and then factoring a character vector:
long.char.vector = sample(c("left", "middle", "right"), replace=TRUE, 50000000)
long.factor = factor(long.char.vector)
How can I construct long.factor
without first constructing long.char.vector
? Yes, I know those two lines of code can be combined, but the resulting line of code still creates the gigantic char vector anyway.
It's not going to be much more efficient, but you can sample a factor vector:
big.factor <- sample(factor(c("left", "middle", "right")), replace=TRUE, 5e7)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With