Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change stringsAsFactors settings for data.frame

I have a function in which I define a data.frame that I use loops to fill with data. At some point I get the Warning message:

Warning messages: 1: In [<-.factor(*tmp*, iseq, value = "CHANGE") : invalid factor level, NAs generated

Therefore, when I define my data.frame, I'd like to set the option stringsAsFactors to FALSE but I don't understand how to do it.

I have tried:

DataFrame = data.frame(stringsAsFactors=FALSE) 

and also:

options(stringsAsFactors=FALSE) 

What is the correct way to set the stringsAsFactors option?

like image 605
VincentH Avatar asked Jul 18 '12 09:07

VincentH


People also ask

How are the strings treated in a Dataframe by default?

In summary, strings are read by default as factors (i.e. distinct groups). This has two consequences: Your data is stored more efficiently, because each unique string gets a number and whenever it's used in your data frame you can store its numerical value (which is much smaller in size)

What does stringsAsFactors true mean in R?

The argument 'stringsAsFactors' is an argument to the 'data. frame()' function in R. It is a logical that indicates whether strings in a data frame should be treated as factor variables or as just plain strings.

What does stringsAsFactors false do?

Sometimes a string is just a string. It is often claimed Sigmund Freud said “Sometimes a cigar is just a cigar.” To avoid problems delay re-encoding of strings by using stringsAsFactors = FALSE when creating data.

What does as data frame do in R?

as. data. frame() function in R Programming Language is used to convert an object to data frame. These objects can be Vectors, Lists, Matrices, and Factors.


1 Answers

It depends on how you fill your data frame, for which you haven't given any code. When you construct a new data frame, you can do it like this:

x <- data.frame(aName = aVector, bName = bVector, stringsAsFactors = FALSE) 

In this case, if e.g. aVector is a character vector, then the dataframe column x$aName will be a character vector as well, and not a factor vector. Combining that with an existing data frame (using rbind, cbind or similar) should preserve that mode.

When you execute

options(stringsAsFactors = FALSE) 

you change the global default setting. So every data frame you create after executing that line will not auto-convert to factors unless explicitly told to do so. If you only need to avoid conversion in a single place, then I'd rather not change the default. However if this affects many places in your code, changing the default seems like a good idea.

One more thing: if your vector already contains factors, then neither of the above will change it back into a character vector. To do so, you should explicitly convert it back using as.character or similar.

like image 169
MvG Avatar answered Oct 06 '22 00:10

MvG