I have a function in which I define a data.frame
that I use loops to fill with data. At some point I get the Warning message:
Warning messages: 1: In
[<-.factor
(*tmp*
, iseq, value = "CHANGE") : invalid factor level, NAs generated
Therefore, when I define my data.frame, I'd like to set the option stringsAsFactors
to FALSE
but I don't understand how to do it.
I have tried:
DataFrame = data.frame(stringsAsFactors=FALSE)
and also:
options(stringsAsFactors=FALSE)
What is the correct way to set the stringsAsFactors option?
In summary, strings are read by default as factors (i.e. distinct groups). This has two consequences: Your data is stored more efficiently, because each unique string gets a number and whenever it's used in your data frame you can store its numerical value (which is much smaller in size)
The argument 'stringsAsFactors' is an argument to the 'data. frame()' function in R. It is a logical that indicates whether strings in a data frame should be treated as factor variables or as just plain strings.
Sometimes a string is just a string. It is often claimed Sigmund Freud said “Sometimes a cigar is just a cigar.” To avoid problems delay re-encoding of strings by using stringsAsFactors = FALSE when creating data.
as. data. frame() function in R Programming Language is used to convert an object to data frame. These objects can be Vectors, Lists, Matrices, and Factors.
It depends on how you fill your data frame, for which you haven't given any code. When you construct a new data frame, you can do it like this:
x <- data.frame(aName = aVector, bName = bVector, stringsAsFactors = FALSE)
In this case, if e.g. aVector
is a character vector, then the dataframe column x$aName
will be a character vector as well, and not a factor vector. Combining that with an existing data frame (using rbind
, cbind
or similar) should preserve that mode.
When you execute
options(stringsAsFactors = FALSE)
you change the global default setting. So every data frame you create after executing that line will not auto-convert to factors unless explicitly told to do so. If you only need to avoid conversion in a single place, then I'd rather not change the default. However if this affects many places in your code, changing the default seems like a good idea.
One more thing: if your vector already contains factors, then neither of the above will change it back into a character vector. To do so, you should explicitly convert it back using as.character
or similar.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With