I have a dataset with many columns and I'd like to locate the columns that have fewer than n
unique responses and change just those columns into factors.
Here is one way I was able to do that:
#create sample dataframe
df <- data.frame("number" = c(1,2.7,8,5), "binary1" = c(1,0,1,1),
"answer" = c("Yes","No", "Yes", "No"), "binary2" = c(0,0,1,0))
n <- 3
#for each column
for (col in colnames(df)){
#check if the first entry is numeric
if (is.numeric(df[col][1,1])){
# check that there are fewer than 3 unique values
if ( length(unique(df[col])[,1]) < n ) {
df[[col]] <- factor(df[[col]])
}
}
}
What is another, hopefully more succinct, way of accomplishing this?
Here is a way using tidyverse
.
We can make use of where
within across
to select the columns with logical short-circuit expression where we check
numeric
- (is.numeric
)all
the unique
elements in the column are 0 and 1factor
classlibrary(dplyr)
df1 <- df %>%
mutate(across(where(~is.numeric(.) &&
n_distinct(.) < n &&
all(unique(.) %in% c(0, 1))), factor))
-checking
str(df1)
'data.frame': 4 obs. of 4 variables:
$ number : num 1 2.7 8 5
$ binary1: Factor w/ 2 levels "0","1": 2 1 2 2
$ answer : chr "Yes" "No" "Yes" "No"
$ binary2: Factor w/ 2 levels "0","1": 1 1 2 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With