I have the following dataframe (s
):
s<-read.table(text = "V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 0 62 64 44 NA 55 81 66 57 53
2 0 0 65 50 NA 56 79 69 52 55
3 0 0 0 57 NA 62 84 76 65 59
4 0 0 0 0 NA 30 70 61 41 36
5 0 0 0 0 NA NA NA NA NA NA
6 0 0 0 0 0 0 66 63 51 44
7 0 0 0 0 0 0 0 80 72 72
8 0 0 0 0 0 0 0 0 68 64
9 0 0 0 0 0 0 0 0 0 47
10 0 0 0 0 0 0 0 0 0 0 ", header = TRUE)
As can be seen row 5 and column 5 in this case includes only NA
and 0
values. I would like to omit them and to keep the order of lines and columns. There might be more column and rows in the same pattern and I would like to do the same. The size of the dataframe might be changed.
The final result would be:
V1 V2 V3 V4 V6 V7 V8 V9 V10
1 0 62 64 44 55 81 66 57 53
2 0 0 65 50 56 79 69 52 55
3 0 0 0 57 62 84 76 65 59
4 0 0 0 0 30 70 61 41 36
6 0 0 0 0 0 66 63 51 44
7 0 0 0 0 0 0 80 72 72
8 0 0 0 0 0 0 0 68 64
9 0 0 0 0 0 0 0 0 47
10 0 0 0 0 0 0 0 0 0
Is there a way to get the omitted row and column number (in this case 5), as well?
You have to define more on when exactly you want to drop. In this case it looks like matrix at one side and diagonal always being 0.
However, In general, this is what I use
s[!rowSums(is.na(s))>1,!colSums(is.na(s))>1]
Considering 0's
s[!rowSums(is.na(s)|s==0)>9,!colSums(is.na(s)|s==0)>9]
We can try
v1 <- colSums(is.na(s))
v2 <- colSums(s==0, na.rm=TRUE)
j1 <- !(v1>0 & (v1+v2)==nrow(s) & v2 >0)
v3 <- rowSums(is.na(s))
v4 <- rowSums(s==0, na.rm=TRUE)
i1 <- !(v3>0 & (v3+v4)==ncol(s) & v3 >0)
s[i1, j1]
# V1 V2 V3 V4 V6 V7 V8 V9 V10
#1 0 62 64 44 55 81 66 57 53
#2 0 0 65 50 56 79 69 52 55
#3 0 0 0 57 62 84 76 65 59
#4 0 0 0 0 30 70 61 41 36
#6 0 0 0 0 0 66 63 51 44
#7 0 0 0 0 0 0 80 72 72
#8 0 0 0 0 0 0 0 68 64
#9 0 0 0 0 0 0 0 0 47
#10 0 0 0 0 0 0 0 0 0
Suppose if we change one of the values in 's'
s$V7[3] <- NA
By running the above code, the output will be
# V1 V2 V3 V4 V6 V7 V8 V9 V10
#1 0 62 64 44 55 81 66 57 53
#2 0 0 65 50 56 79 69 52 55
#3 0 0 0 57 62 NA 76 65 59
#4 0 0 0 0 30 70 61 41 36
#6 0 0 0 0 0 66 63 51 44
#7 0 0 0 0 0 0 80 72 72
#8 0 0 0 0 0 0 0 68 64
#9 0 0 0 0 0 0 0 0 47
#10 0 0 0 0 0 0 0 0 0
NOTE: The OP's condition is includes only NA and 0 values. I would like to omit them
I was going to suggest:
sclean <- s[rowSums(s == 0|is.na(s)) != ncol(s) | (rowSums(s == 0, na.rm=TRUE) == ncol(s)),
colSums(s == 0|is.na(s) )!= nrow(s) | colSums(s == 0, na.rm=TRUE) == nrow(s)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With