I have a big data.frame called "mat" of 49952 obs. of 7597 variables and I'm trying to replace NAs with zeros. Here is and example how my data.frame looks like:
A B C E F D Q Z . . .
1 1 1 0 NA NA 0 NA NA
2 0 0 1 NA NA 0 NA NA
3 0 0 0 NA NA 1 NA NA
4 NA NA NA NA NA NA NA NA
5 0 1 0 1 NA 0 NA NA
6 1 1 1 0 NA 0 NA NA
7 0 0 1 0 NA 1 NA NA
.
.
.
I need realy fast tool to replace them. The result should look like:
A B C E F D Q Z . . .
1 1 1 0 0 0 0 0 0
2 0 0 1 0 0 0 0 0
3 0 0 0 0 0 1 0 0
4 0 0 0 0 0 0 0 0
5 0 1 0 1 0 0 0 0
6 1 1 1 0 0 0 0 0
7 0 0 1 0 0 1 0 0
.
.
.
I already tried lapply(mat, function(x){replace(x, is.na(x),0)})
- didn't work - mat[is.na(mat)] <- 0
- error and and maybe too slow - and also link - didn't work too.
@Sotos already advised me plyr::rbind.fill(lapply(L, as.data.frame))
but it didn't work, because it makes data.frame of 379485344 observations and 1 variable (which is 49952x7597) so I have to also trafnsform it back. Is there any better way to do this?
The real structure of my data.frame:
> str(mat)
'data.frame': 49952 obs. of 7597 variables:
$ 6794602 : num 1 NA NA NA NA 0 0 0 0 0 ...
$ 1008667 : num NA 1 0 NA NA 0 0 0 0 0 ...
$ 8009082 : num NA 0 1 NA NA NA NA NA NA NA ...
$ 6740421 : num NA NA NA 1 NA 0 0 0 0 0 ...
$ 6777805 : num NA NA NA NA 1 NA NA NA NA NA ...
$ 1001682 : num NA NA NA NA NA 0 0 0 0 0 ...
$ 1001990 : num NA NA NA NA NA 0 0 0 0 0 ...
$ 1002541 : num NA NA NA NA NA 0 0 0 0 0 ...
$ 1002790 : num NA NA NA NA NA 0 0 0 0 0 ...
Note:
when I tried mat[is.na(mat)] <- 0
there was a warning:
> mat[is.na(mat)] <- 0
Warning messages:
1: In `[<-.factor`(`*tmp*`, thisvar, value = 0) :
invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, thisvar, value = 0) :
invalid factor level, NA generated
> nlevels(mat)
[1] 0
Data.frame mat after using mat[is.na(mat)] <- 0
:
> str(mat)
'data.frame': 49952 obs. of 7597 variables:
$ 6794602 : num 1 0 0 0 0 0 0 0 0 0 ...
$ 1008667 : num 0 1 0 0 0 0 0 0 0 0 ...
$ 8009082 : num 0 0 1 0 0 0 0 0 0 0 ...
$ 6740421 : num 0 0 0 1 0 0 0 0 0 0 ...
$ 6777805 : num 0 0 0 0 1 0 0 0 0 0 ...
$ 1001682 : num 0 0 0 0 0 0 0 0 0 0 ...
$ 1001990 : num 0 0 0 0 0 0 0 0 0 0 ...
$ 1002541 : num 0 0 0 0 0 0 0 0 0 0 ...
$ 1002790 : num 0 0 0 0 0 0 0 0 0 0 ...
So the questions are:
mat[is.na(mat)] <- 0
looks like what I want, but there are too many values, so I can't check if they are all right.The classic way to replace NA's in R is by using the IS.NA() function. The IS.NA() function takes a vector or data frame as input and returns a logical object that indicates whether a value is missing (TRUE or VALUE). Next, you can use this logical object to create a subset of the missing values and assign them a zero.
locf() function from the zoo package to carry the last observation forward to replace your NA values.
replace_na() will not work if the variable is a factor, and the replacement is not already a level for your factor. If this is the issue, you can add another level to your factor variable for 0 before running replace_na(), or you can convert the variable to numeric or character first.
If suspect that some of your columns are factor, you can use the following code to detect and change them to numeric.
inx <- sapply(mat, inherits, "factor")
mat[inx] <- lapply(mat[inx], function(x) as.numeric(as.character(x)))
Then try the following.
mat[] <- lapply(mat, function(x) {x[is.na(x)] <- 0; x})
mat
And here's the data.
mat <-
structure(list(A = c(1L, 0L, 0L, NA, 0L, 1L, 0L), B = c(1L, 0L,
0L, NA, 1L, 1L, 0L), C = c(0L, 1L, 0L, NA, 0L, 1L, 1L), E = c(NA,
NA, NA, NA, 1L, 0L, 0L), F = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), D = c(0L, 0L, 1L, NA,
0L, 0L, 1L), Q = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), Z = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_)), .Names = c("A", "B", "C", "E",
"F", "D", "Q", "Z"), row.names = c("1", "2", "3", "4", "5", "6",
"7"), class = "data.frame")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With