Find rows with non zero values in a subset of columns in pandas dataframe

Question

I have a datframe with 4 columns of strings and others as integers. Now I need to find out those rows of data where at least one of the column is a non-zero value (or > 0).

manwra,sahAyaH,T7,0,0,0,0,T
manwra, akriti,T5,0,0,1,0,K 
awma, prabrtih,B6, 0,1,1,0,S

My output should be

manwra, akriti,T5,0,0,1,0,K 
awma, prabrtih,B6, 0,1,1,0,S

I have tried the following to obtain the answer. The string values are in colums 0,1,2 and -1 (last column).

KT[KT.ix[:,3:-2] != 0]

What I am receiving as output is

NaN,NaNNaN,NaN,NaN,NaN,NaN,NaN
NaN,NaN,NaN,NaN,NaN,1,NaN,NaN
NaN,NaN,NaN,NaN,1,1,NaN,NaN

How to obtain the desired output

Magnus Persson · Accepted Answer

Alternative solution

An alternative solution, which could be useful in certain situations. For example when plotting something and you do not want to create new copies of DataFrames just to filter out some rows and plot it. An added bonus is that the column dtypes in this solution are not limited to numbers, wherever you can do a comparison you can just add it to the eval call.

Guided answer from the start, first to read the data I use the pandas.read_clipboard() function.

c0 c1 c2 c3 c4 c5 c6 c7
manwra sahAyaH T7 0 0 0 0 T
manwra akriti T5 0 0 1 0 K 
awma prabrtih B6 0 1 1 0 S

I added column names c0-c7 here. Just select this table and copy Ctrl+C it to the clipboard.

import pandas as pd 
df = pd.read_clipboard()

and then by chaining an .eval() with a .query() on the DataFrame

df.eval("all_zero = (c3==0)&(c4==0)&(c5==0)&(c6==0)").query("all_zero==False")

we get the desired results.

	c0	c1	c2	c3	c4	c5	c6	c7	all_zero
1	manwra	akriti	T5	0	0	1	0	K	False
2	awma	prabrtih	B6	0	1	1	0	S	False

In the eval call we are just doing comparisons on the columns we want to be zero, if they are all zero the & sign will combine them to True (1) in column all_zero, then we can query the results and say we only want rows where this new column is False (we could as well say all_zero!=True, or changed the comparison in the eval call).

If you do not want the all_zero column just drop that as well

df.eval("all_zero = (c3==0)&(c4==0)&(c5==0)&(c6==0)").query("all_zero==False").drop('all_zero',axis=1)

	c0	c1	c2	c3	c4	c5	c6	c7
1	manwra	akriti	T5	0	0	1	0	K
2	awma	prabrtih	B6	0	1	1	0	S

Timings / Speed comparison

Comparing the speed of the solutions on a bigger DataFrame of same layout we see that my solution is slower (as expected), but it has some added benefits in terms of filtering more broadly. To compare timings I replicated the DataFrame so that it contains 30 000 rows instead

df = df.append([df]*(10000-1),ignore_index=True)

and then ran the different solutions on it and timed it.

Quick timing comparison for the different solutions

I will leave my solution here for reference!

MaxU - stop WAR against UA · Answer

Here is an alternative solution which uses select_dtypes() method:

In [41]: df[(df.select_dtypes(include=['number']) != 0).any(1)]
Out[41]:
        0          1   2  3  4  5  6  7
1  manwra     akriti  T5  0  0  1  0  K
2    awma   prabrtih  B6  0  1  1  0  S

Explanation:

In [42]: df.select_dtypes(include=['number']) != 0
Out[42]:
       3      4      5      6
0  False  False  False  False
1  False  False   True  False
2  False   True   True  False

In [43]: (df.select_dtypes(include=['number']) != 0).any(1)
Out[43]:
0    False
1     True
2     True
dtype: bool

Find rows with non zero values in a subset of columns in pandas dataframe

Tags:

python

pandas

dataframe

Amrith Krishna

2 Answers

Alternative solution

Timings / Speed comparison

Magnus Persson

MaxU - stop WAR against UA

Recent Activity

Donate For Us

Find rows with non zero values in a subset of columns in pandas dataframe

Tags:

python

pandas

dataframe

Amrith Krishna

2 Answers

Alternative solution

Timings / Speed comparison

Magnus Persson

MaxU - stop WAR against UA

Related questions

Recent Activity

Donate For Us