Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Drop rows with all zeros in pandas data frame

Tags:

python

pandas

People also ask

How do I drop a row if all NaN pandas?

By using dropna() method you can drop rows with NaN (Not a Number) and None values from pandas DataFrame. Note that by default it returns the copy of the DataFrame after removing rows. If you wanted to remove from the existing DataFrame, you should use inplace=True .

How do you drop rows with Infinity pandas?

Use the pandas. DataFrame. dropna() method to drop the rows with infinite values.


One-liner. No transpose needed:

df.loc[~(df==0).all(axis=1)]

And for those who like symmetry, this also works...

df.loc[(df!=0).any(axis=1)]

It turns out this can be nicely expressed in a vectorized fashion:

> df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
> df = df[(df.T != 0).any()]
> df
   a  b
1  0  1
2  1  0
3  1  1

I think this solution is the shortest :

df= df[df['ColName'] != 0]

I look up this question about once a month and always have to dig out the best answer from the comments:

df.loc[(df!=0).any(1)]

Thanks Dan Allan!


Replace the zeros with nan and then drop the rows with all entries as nan. After that replace nan with zeros.

import numpy as np
df = df.replace(0, np.nan)
df = df.dropna(how='all', axis=0)
df = df.replace(np.nan, 0)

Couple of solutions I found to be helpful while looking this up, especially for larger data sets:

df[(df.sum(axis=1) != 0)]       # 30% faster 
df[df.values.sum(axis=1) != 0]  # 3X faster 

Continuing with the example from @U2EF1:

In [88]: df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})

In [91]: %timeit df[(df.T != 0).any()]
1000 loops, best of 3: 686 µs per loop

In [92]: df[(df.sum(axis=1) != 0)]
Out[92]: 
   a  b
1  0  1
2  1  0
3  1  1

In [95]: %timeit df[(df.sum(axis=1) != 0)]
1000 loops, best of 3: 495 µs per loop

In [96]: %timeit df[df.values.sum(axis=1) != 0]
1000 loops, best of 3: 217 µs per loop

On a larger dataset:

In [119]: bdf = pd.DataFrame(np.random.randint(0,2,size=(10000,4)))

In [120]: %timeit bdf[(bdf.T != 0).any()]
1000 loops, best of 3: 1.63 ms per loop

In [121]: %timeit bdf[(bdf.sum(axis=1) != 0)]
1000 loops, best of 3: 1.09 ms per loop

In [122]: %timeit bdf[bdf.values.sum(axis=1) != 0]
1000 loops, best of 3: 517 µs per loop