I have a Pandas DataFrame:
import pandas as pd
df = pd.DataFrame([[0.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0],
[1.0, 0.0, 1.0, 3.0, 1.0, 1.0, 7.0, 0.0],
[0.0, 0.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0]
]
, columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'])
A B C D E F G H
0 0.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0
1 1.0 0.0 1.0 3.0 1.0 1.0 7.0 0.0
2 0.0 0.0 13.0 14.0 15.0 16.0 17.0 18.0
And I'd like to return a series (not a list) of the first non-zero value in each row. This currently works but lookup
returns a list instead of a series (I know I can convert the list to a series) but I'm assuming there's a better way:
first_nonzero_colnames = (df > 0).idxmax(axis=1, skipna=True)
df.lookup(first_nonzero_colnames.index, first_nonzero_colnames.values)
[ 2. 1. 13.]
I can use .apply
but I want to avoid it.
@acushner's answer is better. Just putting this out there.
use idxmax
and apply
m = (df != 0).idxmax(1)
df.T.apply(lambda x: x[m[x.name]])
0 2.0
1 1.0
2 13.0
dtype: float64
This also works:
m = (df != 0).idxmax(1)
t = zip(m.index, m.values)
df.stack().loc[t].reset_index(1, drop=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With