Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R concatenating two factors

Tags:

r

This is making me feel dumb, but I am trying to produce a single vector/df/list/etc (anything but a matrix) concatenating two factors. Here's the scenario. I have a 100k line dataset. I used the top half to predict the bottom half and vice versa using knn. So now I have 2 objects created by knn predict().

> head(pred11)
[1] 0 0 0 0 0 0
Levels: 0 1
> head(pred12)
[1] 0 1 1 0 0 0
Levels: 0 1
> class(pred11)
[1] "factor"
> class(pred12)
[1] "factor"

Here's where my problem starts:

> pred13 <- rbind(pred11, pred12)
> class(pred13)
[1] "matrix"

There are 2 problems. First it changes the 0's and 1's to 1's and 2's and second it seems to create a huge matrix that's eats all my memory. I've tried messing with as.numeric(), data.frame(), etc, but can't get it to just combine the 2 50k factors into 1 100k one. Any suggestions?

like image 434
screechOwl Avatar asked Nov 22 '11 16:11

screechOwl


People also ask

How do you combine factors?

There are different ways of combining factors. A simple approach is to average stock weights across a number of single factor indexes – a composite index approach. A variant of this approach is to use a composite of the target factors to create a factor index – a composite factor approach.

How do I concatenate a string in a variable in R?

To concatenate strings in r programming, use paste() function. The syntax of paste() function that is used to concatenate two or more strings. input strings separated by comma.


2 Answers

@James presented one way, I'll chip in with another (shorter):

set.seed(42)
x1 <- factor(sample(0:1,10,replace=T))
x2 <- factor(sample(0:1,10,replace=T))

unlist(list(x1,x2))
# [1] 1 1 0 1 1 1 1 0 1 1 0 1 1 0 0 1 1 0 0 1
#Levels: 0 1

...This might seem a bit like magic, but unlist has special support for factors for this particular purpose! All elements in the list must be factors for this to work.

like image 59
Tommy Avatar answered Sep 28 '22 13:09

Tommy


rbind will create 2 x 50000 matrix in your case which isn't what you want. c is the correct function to combine 2 vectors in a single longer vector. When you use rbind or c on a factor, it will use the underlying integers that map to the levels. In general you need to combine as a character before refactoring:

x1 <- factor(sample(0:1,10,replace=T))
x2 <- factor(sample(0:1,10,replace=T))

factor(c(as.character(x1),as.character(x2)))
 [1] 1 1 1 0 1 1 0 1 0 0 0 1 1 1 1 1 1 0 0 0
Levels: 0 1
like image 25
James Avatar answered Sep 28 '22 14:09

James