Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to delete rows where all the columns are zero

I have the following data frame

dat <- data.frame(a = c(0,0,2,3), b= c(1,0,0,0), c=c(0,0,1,3))

Which prints:

> dat 
  a b c
1 0 1 0
2 0 0 0
3 2 0 1
4 3 0 3

I want to remove rows where all the columns are zeros, resulting in this:

  a b c
1 0 1 0 
3 2 0 1
4 3 0 3

How can I achieve that?

I tried this but failed:

> row_sub = apply(dat, 1, function(row) all(row !=0 ))
> dat[row_sub,]
[1] a b c
<0 rows> (or 0-length row.names)
like image 960
neversaint Avatar asked Dec 06 '22 04:12

neversaint


1 Answers

You can use (1)

dat[as.logical(rowSums(dat != 0)), ]

This works for both positive and negative values.

Another, even faster, possibility for large datasets is (2)

dat[rowSums(!as.matrix(dat)) < ncol(dat), ]

A faster approach for short and long data frames is to use matrix multiplication (3):

dat[as.logical(abs(as.matrix(dat)) %*% rep(1L, ncol(dat))), ]

Some benchmarks:

# the original dataset
dat <- data.frame(a = c(0,0,2,3), b= c(1,0,0,0), c=c(0,0,1,3))

Codoremifa <- function() dat[rowSums(abs(dat)) != 0,]
Marco <- function() dat[!apply(dat, 1, function(x) all(x == 0)), ]
Sven <- function() dat[as.logical(rowSums(dat != 0)), ]
Sven_2 <- function() dat[rowSums(!as.matrix(dat)) < ncol(dat), ]
Sven_3 <- function() dat[as.logical(abs(as.matrix(dat)) %*% rep(1L,ncol(dat))), ]

library(microbenchmark)
microbenchmark(Codoremifa(), Marco(), Sven(), Sven_2(), Sven_3())
# Unit: microseconds
#          expr     min       lq   median       uq     max neval
#  Codoremifa() 267.772 273.2145 277.1015 284.0995 1190.197   100
#       Marco() 192.509 198.4190 201.2175 208.9925  265.594   100
#        Sven() 143.372 147.7260 150.0585 153.9455  227.031   100
#      Sven_2() 152.080 155.1900 156.9000 161.5650  214.591   100
#      Sven_3() 146.793 151.1460 153.3235 157.9885  187.845   100


# a data frame with 10.000 rows
set.seed(1)
dat <- dat[sample(nrow(dat), 10000, TRUE), ]
microbenchmark(Codoremifa(), Marco(), Sven(), Sven_2(), Sven_3())
# Unit: milliseconds
#          expr       min        lq    median        uq        max neval
#   Codoremifa()  2.426419  2.471204  3.488017  3.750189  84.268432   100
#        Marco() 36.268766 37.840246 39.406751 40.791321 119.233175   100
#         Sven()  2.145587  2.184150  2.205299  2.270764  83.055534   100
#       Sven_2()  2.007814  2.048711  2.077167  2.207942  84.944856   100
#       Sven_3()  1.814994  1.844229  1.861022  1.917779   4.452892   100
like image 123
Sven Hohenstein Avatar answered Dec 11 '22 09:12

Sven Hohenstein