I have a sample data frame like below:
data <- data.frame(matrix(sample(1:40), 4, 10, dimnames = list(1:4, LETTERS[1:10])))
I want to know how can I select multiple columns and convert them together to factors. I usually do it in the way like data$A = as.factor(data$A)
. But when the data frame is very large and contains lots of columns, this way will be very time consuming. Does anyone know of a better way to do it?
In R, you can convert multiple numeric variables to factor using lapply function. The lapply function is a part of apply family of functions. They perform multiple iterations (loops) in R. In R, categorical variables need to be set as factor variables.
To convert the data type of all columns from integer to factor, we can use lapply function with factor function.
There are two steps for converting factor to numeric: Step 1: Convert the data vector into a factor. The factor() command is used to create and modify factors in R. Step 2: The factor is converted into a numeric vector using as. numeric().
Use the lapply() Function to Convert Multiple Columns From Integer to Numeric Type in R. Base R's lapply() function allows us to apply a function to elements of a list. We will apply the as. numeric() function.
Choose some columns to coerce to factors:
cols <- c("A", "C", "D", "H")
Use lapply()
to coerce and replace the chosen columns:
data[cols] <- lapply(data[cols], factor) ## as.factor() could also be used
Check the result:
sapply(data, class) # A B C D E F G # "factor" "integer" "factor" "factor" "integer" "integer" "integer" # H I J # "factor" "integer" "integer"
Here is an option using dplyr
. The %<>%
operator from magrittr
update the lhs object with the resulting value.
library(magrittr) library(dplyr) cols <- c("A", "C", "D", "H") data %<>% mutate_each_(funs(factor(.)),cols) str(data) #'data.frame': 4 obs. of 10 variables: # $ A: Factor w/ 4 levels "23","24","26",..: 1 2 3 4 # $ B: int 15 13 39 16 # $ C: Factor w/ 4 levels "3","5","18","37": 2 1 3 4 # $ D: Factor w/ 4 levels "2","6","28","38": 3 1 4 2 # $ E: int 14 4 22 20 # $ F: int 7 19 36 27 # $ G: int 35 40 21 10 # $ H: Factor w/ 4 levels "11","29","32",..: 1 4 3 2 # $ I: int 17 1 9 25 # $ J: int 12 30 8 33
Or if we are using data.table
, either use a for
loop with set
setDT(data) for(j in cols){ set(data, i=NULL, j=j, value=factor(data[[j]])) }
Or we can specify the 'cols' in .SDcols
and assign (:=
) the rhs to 'cols'
setDT(data)[, (cols):= lapply(.SD, factor), .SDcols=cols]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With