How can I replace characters in a dataframe using dplyr?

Question

I have a dataframe where one of the columns have "MISSING" values along with numeric values, which I want to replace with NA. I know I could do it outside dplyr, but I want to keep it in dplyr toolchain.

read.csv('data.csv', header=F) %>% 
  select(V1,V4) %>% 
  mutate(V4=replace(V4, "MISSING", "NA"))

but this is throwing an error:

Error in mutate_impl(.data, dots) : 
  Column `V4` must be length 30681 (the number of rows) or one, not 30682

Data

structure(list(V1 = c("01/01/1933", "01/02/1933", "01/03/1933", 
"01/04/1933", "01/05/1933"), V4 = c("MISSING", "MISSING", "MISSING", 
"MISSING", "MISSING")), .Names = c("V1", "V4"), class = c("data.table", 
"data.frame"), row.names = c(NA, -5L), .internal.selfref = <pointer: 0x10280cf78>)

CPak · Accepted Answer

You can do it without specifying the column

library(dplyr)
df <- df %>% replace(.=="MISSING", NA)

alistaire · Answer

dplyr::na_if is designed for this purpose:

library(dplyr)

df <- structure(list(V1 = c("01/01/1933", "01/02/1933", "01/03/1933", "01/04/1933", "01/05/1933"), 
                     V4 = c("MISSING", "MISSING", "MISSING", "MISSING", "MISSING")), 
                .Names = c("V1", "V4"), class = "data.frame", row.names = c(NA, -5L))

df %>% mutate(V4 = na_if(V4, 'MISSING'))
#>           V1   V4
#> 1 01/01/1933 <NA>
#> 2 01/02/1933 <NA>
#> 3 01/03/1933 <NA>
#> 4 01/04/1933 <NA>
#> 5 01/05/1933 <NA>

Really, it's better to take care of this task on import, though, e.g. with the na.strings parameter of read.csv or data.table::fread or the na parameter of readr::read_csv.

Also, your data is currently a data.table (likely because you used fread), which has its own grammar for [. If you want to use fread but keep the result a standard data.frame, set data.table = FALSE in fread.

How can I replace characters in a dataframe using dplyr?

Tags:

r

dplyr

maximusdooku

2 Answers

CPak

alistaire

Recent Activity

Donate For Us

How can I replace characters in a dataframe using dplyr?

Tags:

r

dplyr

maximusdooku

2 Answers

CPak

alistaire

Related questions

Recent Activity

Donate For Us