Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set NA to 0 in R

Tags:

r

After merging a dataframe with another im left with random NA's for the occasional row. I'd like to set these NA's to 0 so I can perform calculations with them.

Im trying to do this with:

    bothbeams.data = within(bothbeams.data, {       bothbeams.data$x.x = ifelse(is.na(bothbeams.data$x.x) == TRUE, 0, bothbeams.data$x.x)       bothbeams.data$x.y = ifelse(is.na(bothbeams.data$x.y) == TRUE, 0, bothbeams.data$x.y)     }) 

Where $x.x is one column and $x.y is the other of course, but this doesn't seem to work.

like image 848
MaikelS Avatar asked Apr 13 '12 10:04

MaikelS


People also ask

How do I assign a Na to a 0 in R?

To replace NA with 0 in an R data frame, use is.na() function and then select all those values with NA and assign them to 0. myDataframe is the data frame in which you would like replace all NAs with 0.

How do I replace values with 0 in R?

You can replace NA values with zero(0) on numeric columns of R data frame by using is.na() , replace() , imputeTS::replace() , dplyr::coalesce() , dplyr::mutate_at() , dplyr::mutate_if() , and tidyr::replace_na() functions.

How do I replace Na with blank?

How to replace NA (missing values) with blank space or an empty string in an R dataframe? You can replace NA values with blank space on columns of R dataframe (data. frame) by using is.na() , replace() methods.


2 Answers

You can just use the output of is.na to replace directly with subsetting:

bothbeams.data[is.na(bothbeams.data)] <- 0 

Or with a reproducible example:

dfr <- data.frame(x=c(1:3,NA),y=c(NA,4:6)) dfr[is.na(dfr)] <- 0 dfr   x y 1 1 0 2 2 4 3 3 5 4 0 6 

However, be careful using this method on a data frame containing factors that also have missing values:

> d <- data.frame(x = c(NA,2,3),y = c("a",NA,"c")) > d[is.na(d)] <- 0 Warning message: In `[<-.factor`(`*tmp*`, thisvar, value = 0) :   invalid factor level, NA generated 

It "works":

> d   x    y 1 0    a 2 2 <NA> 3 3    c 

...but you likely will want to specifically alter only the numeric columns in this case, rather than the whole data frame. See, eg, the answer below using dplyr::mutate_if.

like image 143
James Avatar answered Sep 18 '22 21:09

James


A solution using mutate_all from dplyr in case you want to add that to your dplyr pipeline:

library(dplyr) df %>%   mutate_all(funs(ifelse(is.na(.), 0, .))) 

Result:

   A B C 1  0 0 0 2  1 0 0 3  2 0 2 4  3 0 5 5  0 0 2 6  0 0 1 7  1 0 1 8  2 0 5 9  3 0 2 10 0 0 4 11 0 0 3 12 1 0 5 13 2 0 5 14 3 0 0 15 0 0 1 

If in any case you only want to replace the NA's in numeric columns, which I assume it might be the case in modeling, you can use mutate_if:

library(dplyr) df %>%   mutate_if(is.numeric, funs(ifelse(is.na(.), 0, .))) 

or in base R:

replace(is.na(df), 0) 

Result:

   A    B C 1  0    0 0 2  1 <NA> 0 3  2    0 2 4  3 <NA> 5 5  0    0 2 6  0 <NA> 1 7  1    0 1 8  2 <NA> 5 9  3    0 2 10 0 <NA> 4 11 0    0 3 12 1 <NA> 5 13 2    0 5 14 3 <NA> 0 15 0    0 1 

Update

with dplyr 1.0.0, across is introduced:

library(dplyr) # Replace `NA` for all columns df %>%   mutate(across(everything(), ~ ifelse(is.na(.), 0, .)))  # Replace `NA` for numeric columns df %>%   mutate(across(where(is.numeric), ~ ifelse(is.na(.), 0, .))) 

Data:

set.seed(123) df <- data.frame(A=rep(c(0:3, NA), 3),                   B=rep(c("0", NA), length.out = 15),                   C=sample(c(0:5, NA), 15, replace = TRUE)) 
like image 45
acylam Avatar answered Sep 17 '22 21:09

acylam