Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Coerce multiple columns to factors at once

I have a sample data frame like below:

data <- data.frame(matrix(sample(1:40), 4, 10, dimnames = list(1:4, LETTERS[1:10]))) 

I want to know how can I select multiple columns and convert them together to factors. I usually do it in the way like data$A = as.factor(data$A). But when the data frame is very large and contains lots of columns, this way will be very time consuming. Does anyone know of a better way to do it?

like image 438
wsda Avatar asked Oct 16 '15 21:10

wsda


People also ask

How do you convert multiple columns to factors?

In R, you can convert multiple numeric variables to factor using lapply function. The lapply function is a part of apply family of functions. They perform multiple iterations (loops) in R. In R, categorical variables need to be set as factor variables.

How do I factor all columns in R?

To convert the data type of all columns from integer to factor, we can use lapply function with factor function.

How do I convert factors to variables in R?

There are two steps for converting factor to numeric: Step 1: Convert the data vector into a factor. The factor() command is used to create and modify factors in R. Step 2: The factor is converted into a numeric vector using as. numeric().

How do I convert multiple variables to numeric in R?

Use the lapply() Function to Convert Multiple Columns From Integer to Numeric Type in R. Base R's lapply() function allows us to apply a function to elements of a list. We will apply the as. numeric() function.


2 Answers

Choose some columns to coerce to factors:

cols <- c("A", "C", "D", "H") 

Use lapply() to coerce and replace the chosen columns:

data[cols] <- lapply(data[cols], factor)  ## as.factor() could also be used 

Check the result:

sapply(data, class) #        A         B         C         D         E         F         G  # "factor" "integer"  "factor"  "factor" "integer" "integer" "integer"  #        H         I         J  # "factor" "integer" "integer"  
like image 154
Rich Scriven Avatar answered Sep 23 '22 21:09

Rich Scriven


Here is an option using dplyr. The %<>% operator from magrittr update the lhs object with the resulting value.

library(magrittr) library(dplyr) cols <- c("A", "C", "D", "H")  data %<>%        mutate_each_(funs(factor(.)),cols) str(data) #'data.frame':  4 obs. of  10 variables: # $ A: Factor w/ 4 levels "23","24","26",..: 1 2 3 4 # $ B: int  15 13 39 16 # $ C: Factor w/ 4 levels "3","5","18","37": 2 1 3 4 # $ D: Factor w/ 4 levels "2","6","28","38": 3 1 4 2 # $ E: int  14 4 22 20 # $ F: int  7 19 36 27 # $ G: int  35 40 21 10 # $ H: Factor w/ 4 levels "11","29","32",..: 1 4 3 2 # $ I: int  17 1 9 25 # $ J: int  12 30 8 33 

Or if we are using data.table, either use a for loop with set

setDT(data) for(j in cols){   set(data, i=NULL, j=j, value=factor(data[[j]])) } 

Or we can specify the 'cols' in .SDcols and assign (:=) the rhs to 'cols'

setDT(data)[, (cols):= lapply(.SD, factor), .SDcols=cols] 
like image 21
akrun Avatar answered Sep 23 '22 21:09

akrun