Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quick replace of NA - an error or warning

Tags:

dataframe

r

na

I have a big data.frame called "mat" of 49952 obs. of 7597 variables and I'm trying to replace NAs with zeros. Here is and example how my data.frame looks like:

    A   B   C   E   F   D   Q   Z   . . .
1   1   1   0   NA  NA  0   NA  NA
2   0   0   1   NA  NA  0   NA  NA
3   0   0   0   NA  NA  1   NA  NA
4   NA  NA  NA  NA  NA  NA  NA  NA
5   0   1   0   1   NA  0   NA  NA 
6   1   1   1   0   NA  0   NA  NA
7   0   0   1   0   NA  1   NA  NA 
.
.
.

I need realy fast tool to replace them. The result should look like:

    A   B   C   E   F   D   Q   Z   . . .
1   1   1   0   0   0   0   0   0
2   0   0   1   0   0   0   0   0 
3   0   0   0   0   0   1   0   0
4   0   0   0   0   0   0   0   0
5   0   1   0   1   0   0   0   0 
6   1   1   1   0   0   0   0   0
7   0   0   1   0   0   1   0   0 
.
.
.

I already tried lapply(mat, function(x){replace(x, is.na(x),0)}) - didn't work - mat[is.na(mat)] <- 0 - error and and maybe too slow - and also link - didn't work too.

@Sotos already advised me plyr::rbind.fill(lapply(L, as.data.frame)) but it didn't work, because it makes data.frame of 379485344 observations and 1 variable (which is 49952x7597) so I have to also trafnsform it back. Is there any better way to do this?

The real structure of my data.frame:

> str(mat)
'data.frame':   49952 obs. of  7597 variables:
 $ 6794602   : num  1 NA NA NA NA 0 0 0 0 0 ...
 $ 1008667   : num  NA 1 0 NA NA 0 0 0 0 0 ...
 $ 8009082   : num  NA 0 1 NA NA NA NA NA NA NA ...
 $ 6740421   : num  NA NA NA 1 NA 0 0 0 0 0 ...
 $ 6777805   : num  NA NA NA NA 1 NA NA NA NA NA ...
 $ 1001682   : num  NA NA NA NA NA 0 0 0 0 0 ...
 $ 1001990   : num  NA NA NA NA NA 0 0 0 0 0 ...
 $ 1002541   : num  NA NA NA NA NA 0 0 0 0 0 ...
 $ 1002790   : num  NA NA NA NA NA 0 0 0 0 0 ...

Note:

when I tried mat[is.na(mat)] <- 0 there was a warning:

> mat[is.na(mat)] <- 0
Warning messages:
1: In `[<-.factor`(`*tmp*`, thisvar, value = 0) :
  invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, thisvar, value = 0) :
  invalid factor level, NA generated
> nlevels(mat)
[1] 0

Data.frame mat after using mat[is.na(mat)] <- 0:

> str(mat)
'data.frame':   49952 obs. of  7597 variables:
 $ 6794602   : num  1 0 0 0 0 0 0 0 0 0 ...
 $ 1008667   : num  0 1 0 0 0 0 0 0 0 0 ...
 $ 8009082   : num  0 0 1 0 0 0 0 0 0 0 ...
 $ 6740421   : num  0 0 0 1 0 0 0 0 0 0 ...
 $ 6777805   : num  0 0 0 0 1 0 0 0 0 0 ...
 $ 1001682   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ 1001990   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ 1002541   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ 1002790   : num  0 0 0 0 0 0 0 0 0 0 ...

So the questions are:

  1. Is there any other fast way to replace the NA?
  2. Is the warning big deal? Because data after using mat[is.na(mat)] <- 0 looks like what I want, but there are too many values, so I can't check if they are all right.
like image 702
Martina Zapletalová Avatar asked Aug 08 '17 17:08

Martina Zapletalová


People also ask

How do I replace Na in R?

The classic way to replace NA's in R is by using the IS.NA() function. The IS.NA() function takes a vector or data frame as input and returns a logical object that indicates whether a value is missing (TRUE or VALUE). Next, you can use this logical object to create a subset of the missing values and assign them a zero.

Which function is used to replace the NA values with the most recent values?

locf() function from the zoo package to carry the last observation forward to replace your NA values.

Why is Replace_na not working?

replace_na() will not work if the variable is a factor, and the replacement is not already a level for your factor. If this is the issue, you can add another level to your factor variable for 0 before running replace_na(), or you can convert the variable to numeric or character first.


2 Answers

Try the following:

mat %>% replace(is.na(.), 0)
like image 196
Sagar Avatar answered Nov 12 '22 11:11

Sagar


If suspect that some of your columns are factor, you can use the following code to detect and change them to numeric.

inx <- sapply(mat, inherits, "factor")
mat[inx] <- lapply(mat[inx], function(x) as.numeric(as.character(x)))

Then try the following.

mat[] <- lapply(mat, function(x) {x[is.na(x)] <- 0; x})
mat

And here's the data.

mat <-
structure(list(A = c(1L, 0L, 0L, NA, 0L, 1L, 0L), B = c(1L, 0L, 
0L, NA, 1L, 1L, 0L), C = c(0L, 1L, 0L, NA, 0L, 1L, 1L), E = c(NA, 
NA, NA, NA, 1L, 0L, 0L), F = c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), D = c(0L, 0L, 1L, NA, 
0L, 0L, 1L), Q = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), Z = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_)), .Names = c("A", "B", "C", "E", 
"F", "D", "Q", "Z"), row.names = c("1", "2", "3", "4", "5", "6", 
"7"), class = "data.frame")
like image 30
Rui Barradas Avatar answered Nov 12 '22 12:11

Rui Barradas