I have a huge dataset and wish to replace values in certain columns (VAR1, VAR2) with NA if they do not start with AA or DD. Data: <pre class="prettyprint"><code>DF<-tibble::tribble( ~ID, ~VAR1, ~VAR1DATE, ~VAR2, ~VAR2DATE, 1L, "AABB", "2001-01-01", "BBAA", "2001-01-01", 2L, "AACC", "2001-01-02", "AACC", "2001-01-02", 3L, "CCDD", "2001-01-03", "DDCC", "2001-01-03", 4L, "DDAA", "2001-01-04", "CCBB", "2001-01-04", 5L, "CCBB", "2001-01-05", "CCBB", "2001-01-05" ) </code></pre> Desired output: <pre class="prettyprint"><code>A tibble: 5 × 5 ID VAR1 VAR1DATE VAR2 VAR2DATE <int> <chr> <chr> <chr> <chr> 1 1 AABB 2001-01-01 NA NA 2 2 AACC 2001-01-02 AACC 2001-01-02 3 3 NA NA DDCC 2001-01-03 4 4 DDAA 2001-01-04 NA NA 5 5 NA NA NA NA </code></pre> Is there an elegant and smart way to this? <code>mutate_all</code>?

We may do this in two steps - loop <code>across</code> the columns that have 'VAR' followed by digits (<code>\\d+</code>) in column names, <code>replace</code> the values where the first two characters are not <code>AA</code> or <code>DD</code> to <code>NA</code>, then <code>replace</code> the corresponding <code>DATE</code> column to <code>NA</code> based on the <code>NA</code> in the 'VAR1', 'VAR2' columns <pre class="prettyprint"><code>library(dplyr) library(stringr) DF %>% mutate(across(matches("^VAR\\d+$"), ~ replace(., !substr(., 1, 2) %in% c("AA", "DD"), NA)), across(ends_with("DATE"), ~ replace(., is.na(get(str_remove(cur_column(), "DATE"))), NA))) </code></pre> -output <pre class="prettyprint"><code># A tibble: 5 × 5 ID VAR1 VAR1DATE VAR2 VAR2DATE <int> <chr> <chr> <chr> <chr> 1 1 AABB 2001-01-01 <NA> <NA> 2 2 AACC 2001-01-02 AACC 2001-01-02 3 3 <NA> <NA> DDCC 2001-01-03 4 4 DDAA 2001-01-04 <NA> <NA> 5 5 <NA> <NA> <NA> <NA> </code></pre>

Use <code>mutate/across</code> with the assignment function <code>is.na<-</code>. <pre class="prettyprint"><code>DF %>% mutate(across(starts_with("VAR"), \(x) `is.na<-`(x, !grepl("^AA|^DD", x)))) ## A tibble: 5 x 5 # ID VAR1 VAR1DATE VAR2 VAR2DATE # <int> <chr> <chr> <chr> <chr> #1 1 AABB NA NA NA #2 2 AACC NA AACC NA #3 3 NA NA DDCC NA #4 4 DDAA NA NA NA #5 5 NA NA NA NA </code></pre> Or simpler: <pre class="prettyprint"><code>DF %>% mutate(across(starts_with("VAR"), ~`is.na<-`(., !grepl("^AA|^DD", .)))) </code></pre>

Another possibility, using <code>tidyr::unite</code> and <code>tiydr::separate</code>: <pre class="prettyprint lang-r prettyprint-override"><code>library(tidyverse) DF<-tibble::tribble( ~ID, ~VAR1, ~VAR1DATE, ~VAR2, ~VAR2DATE, 1L, "AABB", "2001-01-01", "BBAA", "2001-01-01", 2L, "AACC", "2001-01-02", "AACC", "2001-01-02", 3L, "CCDD", "2001-01-03", "DDCC", "2001-01-03", 4L, "DDAA", "2001-01-04", "CCBB", "2001-01-04", 5L, "CCBB", "2001-01-05", "CCBB", "2001-01-05" ) DF %>% unite(VAR1,VAR1,VAR1DATE) %>% unite(VAR2,VAR2,VAR2DATE) %>% mutate(across(starts_with("VAR"),~if_else(str_detect(.x, "^AA|^DD"), .x, NA_character_))) %>% separate(VAR1,into = c("VAR1", "VAR1DATE"), sep = "_") %>% separate(VAR2,into = c("VAR2", "VAR2DATE"), sep = "_") #> # A tibble: 5 × 5 #> ID VAR1 VAR1DATE VAR2 VAR2DATE #> <int> <chr> <chr> <chr> <chr> #> 1 1 AABB 2001-01-01 <NA> <NA> #> 2 2 AACC 2001-01-02 AACC 2001-01-02 #> 3 3 <NA> <NA> DDCC 2001-01-03 #> 4 4 DDAA 2001-01-04 <NA> <NA> #> 5 5 <NA> <NA> <NA> <NA> </code></pre>

replace values with NA in several columns

Tags:

r

I have a huge dataset and wish to replace values in certain columns (VAR1, VAR2) with NA if they do not start with AA or DD.

Data:

DF<-tibble::tribble(
  ~ID,  ~VAR1,    ~VAR1DATE,  ~VAR2,    ~VAR2DATE,
   1L, "AABB", "2001-01-01", "BBAA", "2001-01-01",
   2L, "AACC", "2001-01-02", "AACC", "2001-01-02",
   3L, "CCDD", "2001-01-03", "DDCC", "2001-01-03",
   4L, "DDAA", "2001-01-04", "CCBB", "2001-01-04",
   5L, "CCBB", "2001-01-05", "CCBB", "2001-01-05"
  )

Desired output:

A tibble: 5 × 5
     ID VAR1  VAR1DATE   VAR2  VAR2DATE  
  <int> <chr> <chr>      <chr> <chr>     
1     1 AABB  2001-01-01 NA    NA        
2     2 AACC  2001-01-02 AACC  2001-01-02
3     3 NA    NA         DDCC  2001-01-03
4     4 DDAA  2001-01-04 NA    NA        
5     5 NA    NA         NA    NA

Is there an elegant and smart way to this? mutate_all?

525

asked Dec 27 '21 18:12

hkolvs

Video Answer

5 Answers

We may do this in two steps - loop across the columns that have 'VAR' followed by digits (\\d+) in column names, replace the values where the first two characters are not AA or DD to NA, then replace the corresponding DATE column to NA based on the NA in the 'VAR1', 'VAR2' columns

library(dplyr)
library(stringr)
DF %>%
    mutate(across(matches("^VAR\\d+$"),
        ~ replace(., !substr(., 1, 2)  %in% c("AA", "DD"), NA)), 
      across(ends_with("DATE"), 
     ~ replace(., is.na(get(str_remove(cur_column(), "DATE"))), NA)))

-output

# A tibble: 5 × 5
     ID VAR1  VAR1DATE   VAR2  VAR2DATE  
  <int> <chr> <chr>      <chr> <chr>     
1     1 AABB  2001-01-01 <NA>  <NA>      
2     2 AACC  2001-01-02 AACC  2001-01-02
3     3 <NA>  <NA>       DDCC  2001-01-03
4     4 DDAA  2001-01-04 <NA>  <NA>      
5     5 <NA>  <NA>       <NA>  <NA>

189

answered Oct 21 '22 23:10

akrun

Use mutate/across with the assignment function is.na<-.

DF %>%
  mutate(across(starts_with("VAR"), \(x) `is.na<-`(x, !grepl("^AA|^DD", x))))
## A tibble: 5 x 5
#     ID VAR1  VAR1DATE VAR2  VAR2DATE
#  <int> <chr> <chr>    <chr> <chr>   
#1     1 AABB  NA       NA    NA      
#2     2 AACC  NA       AACC  NA      
#3     3 NA    NA       DDCC  NA      
#4     4 DDAA  NA       NA    NA      
#5     5 NA    NA       NA    NA

Or simpler:

DF %>%
  mutate(across(starts_with("VAR"), ~`is.na<-`(., !grepl("^AA|^DD", .))))

answered Oct 22 '22 01:10

Rui Barradas

Here is a tidyverse solution. Using across with str_replace_all and appending two ifelse statements.

library(dplyr)
library(stringr)

DF %>% 
  mutate(across(c(VAR1, VAR2), ~str_replace_all(., "^[^AA|DD]", NA_character_))) %>% 
  mutate(VAR1DATE = ifelse(is.na(VAR1), NA_character_, VAR1DATE),
         VAR2DATE = ifelse(is.na(VAR2), NA_character_, VAR2DATE))

     ID VAR1  VAR1DATE   VAR2  VAR2DATE  
  <int> <chr> <chr>      <chr> <chr>     
1     1 AABB  2001-01-01 NA    NA        
2     2 AACC  2001-01-02 AACC  2001-01-02
3     3 NA    NA         DDCC  2001-01-03
4     4 DDAA  2001-01-04 NA    NA        
5     5 NA    NA         NA    NA

answered Oct 22 '22 00:10

TarJae

Another possibility, using tidyr::unite and tiydr::separate:

library(tidyverse)

DF<-tibble::tribble(
  ~ID,  ~VAR1,    ~VAR1DATE,  ~VAR2,    ~VAR2DATE,
  1L, "AABB", "2001-01-01", "BBAA", "2001-01-01",
  2L, "AACC", "2001-01-02", "AACC", "2001-01-02",
  3L, "CCDD", "2001-01-03", "DDCC", "2001-01-03",
  4L, "DDAA", "2001-01-04", "CCBB", "2001-01-04",
  5L, "CCBB", "2001-01-05", "CCBB", "2001-01-05"
)

DF %>% 
  unite(VAR1,VAR1,VAR1DATE) %>% unite(VAR2,VAR2,VAR2DATE) %>% 
  mutate(across(starts_with("VAR"),~if_else(str_detect(.x, "^AA|^DD"), .x, NA_character_))) %>% 
  separate(VAR1,into = c("VAR1", "VAR1DATE"), sep = "_") %>% 
  separate(VAR2,into = c("VAR2", "VAR2DATE"), sep = "_")

#> # A tibble: 5 × 5
#>      ID VAR1  VAR1DATE   VAR2  VAR2DATE  
#>   <int> <chr> <chr>      <chr> <chr>     
#> 1     1 AABB  2001-01-01 <NA>  <NA>      
#> 2     2 AACC  2001-01-02 AACC  2001-01-02
#> 3     3 <NA>  <NA>       DDCC  2001-01-03
#> 4     4 DDAA  2001-01-04 <NA>  <NA>      
#> 5     5 <NA>  <NA>       <NA>  <NA>

answered Oct 21 '22 23:10

PaulS

Here is another tidyverse solution with using str_detect to determine where to convert to NA for the date columns. Then, we can use the same type of function on VAR1 and VAR2.

library(tidyverse)

DF %>%
  rowwise %>%
  mutate(
    VAR1DATE = ifelse(str_detect(VAR1, '^BB|^CC') == TRUE, NA, VAR1DATE),
    VAR2DATE = ifelse(str_detect(VAR2, '^BB|^CC') == TRUE, NA, VAR2DATE)
  ) %>%
  mutate(across(c(VAR1, VAR2), function(x)
    ifelse(str_detect(x, '^BB|^CC') == TRUE, NA, x)))

Output

# A tibble: 5 × 5
# Rowwise: 
     ID VAR1  VAR1DATE   VAR2  VAR2DATE  
  <int> <chr> <chr>      <chr> <chr>     
1     1 AABB  2001-01-01 NA    NA        
2     2 AACC  2001-01-02 AACC  2001-01-02
3     3 NA    NA         DDCC  2001-01-03
4     4 DDAA  2001-01-04 NA    NA        
5     5 NA    NA         NA    NA

answered Oct 21 '22 23:10

AndrewGB

Related questions
                            
                                Join vectors into dataframe by matching values
                            
                                R loop over two or more vectors simultaneously - paralell
                            
                                How to sum list elements with the same name?
                            
                                Get text from href tag after specific class
                            
                                cbind a dynamic column name from a string in R
                            
                                Weighted logistic regression in R
                            
                                How to add title to a networkD3 visualisation when saving as a web page?
                            
                                Extract interaction terms from regression estimates
                            
                                In an array in R, how can we conduct subtraction in each element of the array?
                            
                                R regex to match beginning and end of string, ignoring middle
                            
                                Rename all column names with a suffix except listed column name using dplyr?
                            
                                Create a dataframe with list elements with dplyr in R
                            
                                Use gsub remove all string before first numeric character
                            
                                problem with sum function after inplace editing using Rcpp
                            
                                How to find if ANY column has a specific value I am looking for?
                            
                                how to slice data in lapply function
                            
                                In geom_sf_text, how to nudge x and y in aesthetics?
                            
                                How to multiply entire row with a matching row name in another dataframe?
                            
                                Conditionally pasting values from one column to another in R
                            
                                How to mutate multiple columns as function of multiple columns systematically?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

replace values with NA in several columns

Tags:

r

hkolvs

People also ask

Video Answer

5 Answers

akrun

Rui Barradas

TarJae

PaulS

AndrewGB

Recent Activity

Donate For Us