Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - Function to make a binary variable

I have some variables which take value between 1 and 5. I would like to code them 0 if they take the value between 1 and 3 (included) and 1 if they take the value 4 or 5.

My dataset looks like this

var1    var2        var3
1       1            NA
4       3            4
3       4            5
2       5            3

So I would like it to be like this:

var1    var2        var3
0       0            NA
1       0            1
0       1            1
0       1            0

I tried to do a function and to call it

making_binary <- function (var){
  var <- factor(var >= 4, labels = c(0, 1))
  return(var)
}


df <- lapply(df, making_binary)

But I had an error : incorrect labels : length 2 must be 1 or 1

Where did I go wrong? Thank you very much for your answers!

like image 971
Emeline Avatar asked May 03 '26 12:05

Emeline


2 Answers

You can use :

df[] <- +(df == 4 | df == 5)
df
#  var1 var2 var3
#1    0    0   NA
#2    1    0    1
#3    0    1    1
#4    0    1    0

Comparison of df == 4 | df == 5 returns logical values (TRUE/FALSE), + here turns those logical values to integer values (1/0) respectively.

If you want to apply this for selected columns you can subset the columns by position or by name.

cols <- 1:3 #Position
#cols <- grep('var', names(df)) #Name
df[cols] <- +(df[cols] == 4 | df[cols] == 5)

As far as your function is concerned you can do :

making_binary <- function (var){
  var <- as.integer(var >= 4)
  #which is faster version of
  #var <- ifelse(var >= 4, 1, 0)
  return(var)
}

df[] <- lapply(df, making_binary)

data

df <- structure(list(var1 = c(1L, 4L, 3L, 2L), var2 = c(1L, 3L, 4L, 
5L), var3 = c(NA, 4L, 5L, 3L)), class = "data.frame", row.names = c(NA, -4L))
like image 129
Ronak Shah Avatar answered May 05 '26 02:05

Ronak Shah


I think ifelse would fit the problem well:

df[] <- lapply(df, function(x) ifelse(x >=1 & x <=3, 0, x))
df
  var1 var2 var3
1    0    0   NA
2    4    0    4
3    0    4    5
4    0    5    0
df[] <- lapply(df, function(x) ifelse(x >=4 & x <=5, 1, x))

df
  var1 var2 var3
1    0    0   NA
2    1    0    1
3    0    1    1
4    0    1    0

If you need to do the two steps at once, you can look at dplyr::case_when() or data.table::fcase().

like image 30
Eyayaw Avatar answered May 05 '26 03:05

Eyayaw