I have a data frame. Let's call him bob
:
> head(bob) phenotype exclusion GSM399350 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119- GSM399351 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119- GSM399352 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119- GSM399353 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119- GSM399354 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119- GSM399355 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
I'd like to concatenate the rows of this data frame (this will be another question). But look:
> class(bob$phenotype) [1] "factor"
Bob
's columns are factors. So, for example:
> as.character(head(bob)) [1] "c(3, 3, 3, 6, 6, 6)" "c(3, 3, 3, 3, 3, 3)" [3] "c(29, 29, 29, 30, 30, 30)"
I don't begin to understand this, but I guess these are indices into the levels of the factors of the columns (of the court of king caractacus) of bob
? Not what I need.
Strangely I can go through the columns of bob
by hand, and do
bob$phenotype <- as.character(bob$phenotype)
which works fine. And, after some typing, I can get a data.frame whose columns are characters rather than factors. So my question is: how can I do this automatically? How do I convert a data.frame with factor columns into a data.frame with character columns without having to manually go through each column?
Bonus question: why does the manual approach work?
To convert factor levels into character then we can use as. character function by accessing the column of the data frame that contain factor values. For example, if we have a data frame df which contains a factor column named as Gender then this column can be converted into character column as as. character(df$Gender).
The factor() command is used to create and modify factors in R. Step 2: The factor is converted into a numeric vector using as. numeric(). When a factor is converted into a numeric vector, the numeric codes corresponding to the factor levels will be returned.
To convert the data type of all columns from integer to factor, we can use lapply function with factor function.
The droplevels() function in R can be used to drop unused factor levels. This function is particularly useful if we want to drop factor levels that are no longer used due to subsetting a vector or a data frame.
Just following on Matt and Dirk. If you want to recreate your existing data frame without changing the global option, you can recreate it with an apply statement:
bob <- data.frame(lapply(bob, as.character), stringsAsFactors=FALSE)
This will convert all variables to class "character", if you want to only convert factors, see Marek's solution below.
As @hadley points out, the following is more concise.
bob[] <- lapply(bob, as.character)
In both cases, lapply
outputs a list; however, owing to the magical properties of R, the use of []
in the second case keeps the data.frame class of the bob
object, thereby eliminating the need to convert back to a data.frame using as.data.frame
with the argument stringsAsFactors = FALSE
.
To replace only factors:
i <- sapply(bob, is.factor) bob[i] <- lapply(bob[i], as.character)
In package dplyr in version 0.5.0 new function mutate_if
was introduced:
library(dplyr) bob %>% mutate_if(is.factor, as.character) -> bob
...and in version 1.0.0 was replaced by across
:
library(dplyr) bob %>% mutate(across(where(is.factor), as.character)) -> bob
Package purrr from RStudio gives another alternative:
library(purrr) bob %>% modify_if(is.factor, as.character) -> bob
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With