Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R mutate multiple columns with ifelse

Tags:

r

tidyverse

This is a similar problem to this (R Mutate multiple columns with ifelse()-condition), but I have trouble applying it to my problem.

Here's a reproducible example:

df <- structure(list(comm_id = c("060015", "060015", "060015", "060015", 
"060015", "060015", "060015", "060015", "060015", "060015", "060015"
), trans_year = c(1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 
2000, 2001, 2002), f10_1 = c(1996, 1996, 1996, 1996, 1996, 1996, 
1996, 1996, 1996, 1996, 1996), f10_2 = c(1997, 1997, 1997, 1997, 
1997, 1997, 1997, 1997, 1997, 1997, 1997)), row.names = c(NA, 
-11L), class = c("tbl_df", "tbl", "data.frame"))

I want to create additional columns (in my actual problem, more than 10 columns in a similar way) using ifelse condition, which can be done as following with brute force. But my actual problem has more than 10 such columns, so it would benefit a lot from a more elegant approach.

df %>%
  mutate(post_f10_1 = ifelse(trans_year >= f10_1 & trans_year < f10_1 +5, 1, 0),
         post_f10_2 = ifelse(trans_year >= f10_2 & trans_year < f10_2 +5, 1, 0))

I've tried a couple of different failed approaches as the following:

with base,

n <- c(1:2)
df[paste0("post_f10_", n)] <- lapply(n, function(x) 
  ifelse(df$trans_year >= paste0("f10_", x) & df$trans_year < paste0("f10_", x) + 5, 1, 0))
#  Error in paste0("f10_", x) + 5 : non-numeric argument to binary operator 

with new across function from tidyverse

df %>%
  mutate(across(starts_with("f10_"), 
                ~ ifelse(trnas_year >= .x & trans_year < .x + 5, 1, 0), .names = "post_{col}"))
# Error: Problem with `mutate()` input `..1`.
# x object 'trnas_year' not found
# ℹ Input `..1` is `across(...)`.

The output I want looks like

  comm_id trans_year f10_1 f10_2 post_f10_1 post_f10_2
   <chr>        <dbl> <dbl> <dbl>      <dbl>      <dbl>
 1 060015        1992  1996  1997          0          0
 2 060015        1993  1996  1997          0          0
 3 060015        1994  1996  1997          0          0
 4 060015        1995  1996  1997          0          0
 5 060015        1996  1996  1997          1          0
 6 060015        1997  1996  1997          1          1
 7 060015        1998  1996  1997          1          1
 8 060015        1999  1996  1997          1          1
 9 060015        2000  1996  1997          1          1
10 060015        2001  1996  1997          0          1
11 060015        2002  1996  1997          0          0

If possible, I'd prefer tidyverse approach. Thanks!

Update

My original tidyverse approach did not work because of a typo. So I update OP. Also, the answer below is much more elegant than what I post here.

df %>%
+   mutate(across(starts_with("f10_"), 
+                 ~ ifelse(trans_year >= .x & trans_year < .x + 5, 1, 0), .names = "post_{col}"))
# A tibble: 11 x 6
   comm_id trans_year f10_1 f10_2 post_f10_1 post_f10_2
   <chr>        <dbl> <dbl> <dbl>      <dbl>      <dbl>
 1 060015        1992  1996  1997          0          0
 2 060015        1993  1996  1997          0          0
 3 060015        1994  1996  1997          0          0
 4 060015        1995  1996  1997          0          0
 5 060015        1996  1996  1997          1          0
 6 060015        1997  1996  1997          1          1
 7 060015        1998  1996  1997          1          1
 8 060015        1999  1996  1997          1          1
 9 060015        2000  1996  1997          1          1
10 060015        2001  1996  1997          0          1
11 060015        2002  1996  1997          0          0
like image 690
qnp1521 Avatar asked Jul 23 '20 06:07

qnp1521


1 Answers

You can use :

library(dplyr)

df %>%  
     mutate(across(starts_with("f10_"), 
               ~as.integer(trans_year >= . & trans_year < (. + 5)), 
               .names = 'post_{col}'))


#  comm_id trans_year f10_1 f10_2 post_f10_1 post_f10_2
#   <chr>        <dbl> <dbl> <dbl>      <int>      <int>
# 1 060015        1992  1996  1997          0          0
# 2 060015        1993  1996  1997          0          0
# 3 060015        1994  1996  1997          0          0
# 4 060015        1995  1996  1997          0          0
# 5 060015        1996  1996  1997          1          0
# 6 060015        1997  1996  1997          1          1
# 7 060015        1998  1996  1997          1          1
# 8 060015        1999  1996  1997          1          1
# 9 060015        2000  1996  1997          1          1
#10 060015        2001  1996  1997          0          1
#11 060015        2002  1996  1997          0          0

Or in base R with lapply :

cols <- paste0('f10_', 1:2)

df[paste0('post_', cols)] <- lapply(df[cols], function(x) 
          as.integer(df$trans_year >= x & df$trans_year < (x + 5)))
like image 76
Ronak Shah Avatar answered Oct 08 '22 03:10

Ronak Shah