Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In R, how can I change many select (binary) columns in a dataframe into factors?

Tags:

dataframe

r

I have a dataset with many columns and I'd like to locate the columns that have fewer than n unique responses and change just those columns into factors.

Here is one way I was able to do that:

#create sample dataframe
df <- data.frame("number" = c(1,2.7,8,5), "binary1" = c(1,0,1,1), 
"answer" = c("Yes","No", "Yes", "No"), "binary2" = c(0,0,1,0))
n <- 3

#for each column
for (col in colnames(df)){
#check if the first entry is numeric
  if (is.numeric(df[col][1,1])){
# check that there are fewer than 3 unique values
    if ( length(unique(df[col])[,1]) < n ) {
    df[[col]] <- factor(df[[col]])
                                           }
                               }
                         }

What is another, hopefully more succinct, way of accomplishing this?

like image 458
Mark Avatar asked Dec 22 '22 15:12

Mark


1 Answers

Here is a way using tidyverse.

We can make use of where within across to select the columns with logical short-circuit expression where we check

  1. the columns are numeric - (is.numeric)
  2. if the 1 is TRUE, check whether number of distinct elements less than the user defined n
  3. if 2 is TRUE, then check all the unique elements in the column are 0 and 1
  4. loop over those selected column and convert to factor class
library(dplyr)
df1 <- df %>% 
     mutate(across(where(~is.numeric(.) && 
                           n_distinct(.) < n && 
                           all(unique(.) %in% c(0, 1))),  factor))

-checking

str(df1)
'data.frame':   4 obs. of  4 variables:
 $ number : num  1 2.7 8 5
 $ binary1: Factor w/ 2 levels "0","1": 2 1 2 2
 $ answer : chr  "Yes" "No" "Yes" "No"
 $ binary2: Factor w/ 2 levels "0","1": 1 1 2 1
like image 117
akrun Avatar answered Apr 07 '23 11:04

akrun