Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I stop merge from converting characters into factors?

Tags:

r

E.g.

chr <- c("a", "b", "c")
intgr <- c(1, 2, 3)
str(chr)
str(base::merge(chr,intgr, stringsAsFactors = FALSE))

gives:

> str(base::merge(chr,intgr, stringsAsFactors = FALSE))
'data.frame':   9 obs. of  2 variables:
 $ x: Factor w/ 3 levels "a","b","c": 1 2 3 1 2 3 1 2 3
 $ y: num  1 1 1 2 2 2 3 3 3

I originally thought it has something to do with how merge coerces arguments into data frames. However, I thought that adding the argument stringsAsFactors = FALSE would override the default coercion behaviour of char -> factor, yet this is not working.

EDIT: Doing the following gives me expected behaviour:

options(stringsAsFactors = FALSE)
str(base::merge(chr,intgr))

that is:

> str(base::merge(chr,intgr))
'data.frame':   9 obs. of  2 variables:
 $ x: chr  "a" "b" "c" "a" ...
 $ y: num  1 1 1 2 2 2 3 3 3

but this is not ideal as it changes the global stringsAsFactors setting.

like image 654
Alex Avatar asked Jun 22 '16 23:06

Alex


People also ask

What is the difference between character and factor in R?

The main difference is that factors have predefined levels. Thus their value can only be one of those levels or NA. Whereas characters can be anything.

How do you turn a character into a factor?

To convert a single factor vector to a character vector we use the as. character() function of the R Language and pass the required factor vector as an argument.

How do you set all variables as factors in R?

In R, you can convert multiple numeric variables to factor using lapply function. The lapply function is a part of apply family of functions. They perform multiple iterations (loops) in R.

What is strings as factors in R?

The argument 'stringsAsFactors' is an argument to the 'data. frame()' function in R. It is a logical that indicates whether strings in a data frame should be treated as factor variables or as just plain strings. The argument also appears in 'read.


2 Answers

You can accomplish this particular "merge" using expand.grid(), since you're really just taking the cartesian product. This allows you to pass the stringsAsFactors argument:

sapply(expand.grid(x=chr,y=intgr,stringsAsFactors=F),class);
##           x           y
## "character"   "numeric"

Here's a way of working around this limitation of merge():

sapply(merge(data.frame(x=chr,stringsAsFactors=F),intgr),class);
##           x           y
## "character"   "numeric"

I would argue that it never makes sense to pass an atomic vector to merge(), since it is only really designed for merging data.frames.

like image 199
bgoldst Avatar answered Sep 21 '22 03:09

bgoldst


We can use CJ from data.table as welll

library(data.table)
str(CJ(chr, intgr))
Classes ‘data.table’ and 'data.frame':  9 obs. of  2 variables:
#$ V1: chr  "a" "a" "a" "b" ...
#$ V2: num  1 2 3 1 2 3 1 2 3
like image 23
akrun Avatar answered Sep 23 '22 03:09

akrun