Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove columns with NA's and/or Zeros Only

Tags:

r

na

zero

I have a sample dataset which looks something similar to the one below:

d= data.frame(a = c(1,5,56,4,9), 
              b = c(0,0,NA,0,NA), 
              c = c(98,67,NA,3,7), 
              d = c(0,0,0,0,0), 
              e = c(NA,NA,NA,NA,NA))

which would be:


| a  |  b |  c | d | e  |
|----|:--:|---:|---|----|
| 1  |  0 | 98 | 0 | NA |
| 5  |  0 | 67 | 0 | NA |
| 56 | NA | NA | 0 | NA |
| 4  | 0  | 3  | 0 | NA |
| 9  | NA | 7  | 0 | NA |

I need to remove all such columns which have:

1. NA's and Zeros
2. Only Zeros
3. Only NA's

So based on the above dataset, columns b,d and e should be eliminated. So, I first need to find out which columns have such conditions and then delete them.

I went through this link Remove the columns with the colsums=0 but I'm not clear with the solution. Also, it doesn't provide me the desired output.

The final output would be:

| a  |  c |
|----|:--:|
| 1  | 98 |
| 5  | 67 |
| 56 | NA |
| 4  | 3  |
| 9  | 7  |
like image 874
hk2 Avatar asked Sep 24 '19 20:09

hk2


4 Answers

One option would be to create a logical vector with colSums based on the number of NA or 0 elements in each column

d[!colSums(is.na(d)|d ==0) == nrow(d)]
#  a    c
#1  1   98
#2  5   67
#3 56   NA
#4  4    3
#5  9    7

Or another option is to replace all the 0s to NA and then apply is.na

d[colSums(!is.na(replace(d, d == 0, NA))) > 0]

Or more compactly with na_if

d[colSums(!is.na(na_if(d, 0))) > 0]
like image 92
akrun Avatar answered Sep 19 '22 15:09

akrun


In base and assuming that we have different type of columns:

as.data.frame(Filter(function(x) !all(x=="NA" | x == "0"), {lapply(d, as.character)}))

Using dplyr:

library(dplyr)

d %>% 
  mutate_all(as.character) %>% 
  select(which(colSums(abs(.), na.rm = T) != 0))

Output:

#>    a  c
#> 1  1 98
#> 2  5 67
#> 3 56 NA
#> 4  4  3
#> 5  9  7
like image 39
M-- Avatar answered Sep 17 '22 15:09

M--


We can use apply column-wise and remove columns which has all, NA or 0's.

d[!apply(d == 0 | is.na(d), 2, all)]

#   a  c
#1  1 98
#2  5 67
#3 56 NA
#4  4  3
#5  9  7
like image 34
Ronak Shah Avatar answered Sep 20 '22 15:09

Ronak Shah


Very strange to store NAs and 0 as strings, but there you go...

bad_column <- function(z) {
  all(z %in% c("NA", "0"))
}

d[, !sapply(d, bad_column), drop = FALSE]
like image 29
Michael M Avatar answered Sep 17 '22 15:09

Michael M