Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert all data frame character columns to factors

Tags:

dataframe

r

Given a (pre-existing) data frame that has columns of various types, what is the simplest way to convert all its character columns to factors, without affecting any columns of other types?

Here's an example data.frame:

df <- data.frame(A = factor(LETTERS[1:5]),                  B = 1:5, C = as.logical(c(1, 1, 0, 0, 1)),                  D = letters[1:5],                  E = paste(LETTERS[1:5], letters[1:5]),                  stringsAsFactors = FALSE) df #   A B     C D   E # 1 A 1  TRUE a A a # 2 B 2  TRUE b B b # 3 C 3 FALSE c C c # 4 D 4 FALSE d D d # 5 E 5  TRUE e E e str(df) # 'data.frame':  5 obs. of  5 variables: #  $ A: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5 #  $ B: int  1 2 3 4 5 #  $ C: logi  TRUE TRUE FALSE FALSE TRUE #  $ D: chr  "a" "b" "c" "d" ... #  $ E: chr  "A a" "B b" "C c" "D d" ... 

I know I can do:

df$D <- as.factor(df$D) df$E <- as.factor(df$E) 

Is there a way to automate this process a bit more?

like image 743
Museful Avatar asked Dec 17 '13 14:12

Museful


People also ask

How do you convert multiple columns to factors?

In R, you can convert multiple numeric variables to factor using lapply function. The lapply function is a part of apply family of functions. They perform multiple iterations (loops) in R. In R, categorical variables need to be set as factor variables.

How do you change all columns to factor in R?

To convert the data type of all columns from integer to factor, we can use lapply function with factor function.

How do you turn a character into a factor?

To convert a single factor vector to a character vector we use the as. character() function of the R Language and pass the required factor vector as an argument.


2 Answers

Roland's answer is great for this specific problem, but I thought I would share a more generalized approach.

DF <- data.frame(x = letters[1:5], y = 1:5, z = LETTERS[1:5],                   stringsAsFactors=FALSE) str(DF) # 'data.frame':  5 obs. of  3 variables: #  $ x: chr  "a" "b" "c" "d" ... #  $ y: int  1 2 3 4 5 #  $ z: chr  "A" "B" "C" "D" ...  ## The conversion DF[sapply(DF, is.character)] <- lapply(DF[sapply(DF, is.character)],                                         as.factor) str(DF) # 'data.frame':  5 obs. of  3 variables: #  $ x: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5 #  $ y: int  1 2 3 4 5 #  $ z: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5 

For the conversion, the left hand side of the assign (DF[sapply(DF, is.character)]) subsets the columns that are character. In the right hand side, for that subset, you use lapply to perform whatever conversion you need to do. R is smart enough to replace the original columns with the results.

The handy thing about this is if you wanted to go the other way or do other conversions, it's as simple as changing what you're looking for on the left and specifying what you want to change it to on the right.

like image 177
A5C1D2H2I1M1N2O1R2T1 Avatar answered Sep 28 '22 08:09

A5C1D2H2I1M1N2O1R2T1


DF <- data.frame(x=letters[1:5], y=1:5, stringsAsFactors=FALSE)  str(DF) #'data.frame':  5 obs. of  2 variables: # $ x: chr  "a" "b" "c" "d" ... # $ y: int  1 2 3 4 5 

You can use as.data.frame to turn all character columns into factor columns:

DF <- as.data.frame(unclass(DF),stringsAsFactors=TRUE) str(DF) #'data.frame':  5 obs. of  2 variables: # $ x: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5 # $ y: int  1 2 3 4 5 
like image 35
Roland Avatar answered Sep 28 '22 08:09

Roland