Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing both row and column of partial NA value

Tags:

dataframe

r

na

row

col

I have the following dataframe (s):

s<-read.table(text = "V1    V2  V3  V4  V5  V6  V7  V8  V9  V10 
  1 0   62  64  44  NA  55  81  66  57  53  
  2 0   0   65  50  NA  56  79  69  52  55  
  3 0   0   0   57  NA  62  84  76  65  59  
  4 0   0   0   0   NA  30  70  61  41  36  
  5 0   0   0   0   NA  NA  NA  NA  NA  NA  
  6 0   0   0   0   0   0   66  63  51  44  
  7 0   0   0   0   0   0   0   80  72  72  
  8 0   0   0   0   0   0   0   0   68  64  
  9 0   0   0   0   0   0   0   0   0   47  
  10    0   0   0   0   0   0   0   0   0   0   ", header = TRUE)

As can be seen row 5 and column 5 in this case includes only NA and 0 values. I would like to omit them and to keep the order of lines and columns. There might be more column and rows in the same pattern and I would like to do the same. The size of the dataframe might be changed. The final result would be:

    V1  V2  V3  V4  V6  V7  V8  V9  V10 
1   0   62  64  44  55  81  66  57  53  
2   0   0   65  50  56  79  69  52  55  
3   0   0   0   57  62  84  76  65  59  
4   0   0   0   0   30  70  61  41  36  
6   0   0   0   0   0   66  63  51  44  
7   0   0   0   0   0   0   80  72  72  
8   0   0   0   0   0   0   0   68  64  
9   0   0   0   0   0   0   0   0   47  
10  0   0   0   0   0   0   0   0   0   

Is there a way to get the omitted row and column number (in this case 5), as well?

like image 923
Avi Avatar asked Apr 26 '16 18:04

Avi


3 Answers

You have to define more on when exactly you want to drop. In this case it looks like matrix at one side and diagonal always being 0.

However, In general, this is what I use

s[!rowSums(is.na(s))>1,!colSums(is.na(s))>1]

Considering 0's

s[!rowSums(is.na(s)|s==0)>9,!colSums(is.na(s)|s==0)>9]
like image 81
Ananta Avatar answered Nov 15 '22 02:11

Ananta


We can try

v1 <- colSums(is.na(s))
v2 <- colSums(s==0, na.rm=TRUE)
j1 <- !(v1>0 & (v1+v2)==nrow(s) & v2 >0)

v3 <- rowSums(is.na(s))
v4 <- rowSums(s==0, na.rm=TRUE)
i1 <- !(v3>0 & (v3+v4)==ncol(s) & v3 >0)
s[i1, j1]
#   V1 V2 V3 V4 V6 V7 V8 V9 V10
#1   0 62 64 44 55 81 66 57  53
#2   0  0 65 50 56 79 69 52  55
#3   0  0  0 57 62 84 76 65  59
#4   0  0  0  0 30 70 61 41  36
#6   0  0  0  0  0 66 63 51  44
#7   0  0  0  0  0  0 80 72  72
#8   0  0  0  0  0  0  0 68  64
#9   0  0  0  0  0  0  0  0  47
#10  0  0  0  0  0  0  0  0   0

Suppose if we change one of the values in 's'

 s$V7[3] <- NA

By running the above code, the output will be

#   V1 V2 V3 V4 V6 V7 V8 V9 V10
#1   0 62 64 44 55 81 66 57  53
#2   0  0 65 50 56 79 69 52  55
#3   0  0  0 57 62 NA 76 65  59
#4   0  0  0  0 30 70 61 41  36
#6   0  0  0  0  0 66 63 51  44
#7   0  0  0  0  0  0 80 72  72
#8   0  0  0  0  0  0  0 68  64
#9   0  0  0  0  0  0  0  0  47
#10  0  0  0  0  0  0  0  0   0

NOTE: The OP's condition is includes only NA and 0 values. I would like to omit them

like image 36
akrun Avatar answered Nov 15 '22 00:11

akrun


I was going to suggest:

sclean <- s[rowSums(s == 0|is.na(s)) != ncol(s) | (rowSums(s == 0, na.rm=TRUE) == ncol(s)),
        colSums(s == 0|is.na(s) )!= nrow(s) | colSums(s == 0, na.rm=TRUE) == nrow(s)]
like image 22
Serban Tanasa Avatar answered Nov 15 '22 02:11

Serban Tanasa