Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Find First Non-zero Value in Each Row of Pandas DataFrame




I have a Pandas DataFrame:

import pandas as pd

df = pd.DataFrame([[0.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0],
                   [1.0, 0.0, 1.0, 3.0, 1.0, 1.0, 7.0, 0.0],
                   [0.0, 0.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0]
                  , columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'])

     A    B     C     D     E     F     G     H
0  0.0  2.0   3.0   4.0   5.0   6.0   7.0   8.0
1  1.0  0.0   1.0   3.0   1.0   1.0   7.0   0.0
2  0.0  0.0  13.0  14.0  15.0  16.0  17.0  18.0

And I'd like to return a series (not a list) of the first non-zero value in each row. This currently works but lookup returns a list instead of a series (I know I can convert the list to a series) but I'm assuming there's a better way:

first_nonzero_colnames = (df > 0).idxmax(axis=1, skipna=True)
df.lookup(first_nonzero_colnames.index, first_nonzero_colnames.values)

[  2.   1.  13.]

I can use .apply but I want to avoid it.

like image 713
slaw Avatar asked Jul 19 '16 20:07


1 Answers

@acushner's answer is better. Just putting this out there.

use idxmax and apply

m = (df != 0).idxmax(1)
df.T.apply(lambda x: x[m[x.name]])

0     2.0
1     1.0
2    13.0
dtype: float64

This also works:

m = (df != 0).idxmax(1)
t = zip(m.index, m.values)

df.stack().loc[t].reset_index(1, drop=True)
like image 99
piRSquared Avatar answered Sep 29 '22 21:09
