What is a pandoric way to get a value and index of the first non-zero element in each column of a DataFrame (top to bottom)?
import pandas as pd
df = pd.DataFrame([[0, 0, 0],
[0, 10, 0],
[4, 0, 0],
[1, 2, 3]],
columns=['first', 'second', 'third'])
print(df.head())
# first second third
# 0 0 0 0
# 1 0 10 0
# 2 4 0 0
# 3 1 2 3
What I would like to achieve:
# value pos
# first 4 2
# second 10 1
# third 1 3
Pandas DataFrame first() Method The first() method returns the first n rows, based on the specified value. The index have to be dates for this method to work as expected.
Check if a column contains only 0's in DataFrameSelect the column by name using subscript operator of DataFrame i.e. df['column_name']. It gives the column contents as a Pandas Series object. Compare the Series object with 0. It returns a boolean Series of the same size.
Accessing the First Element The first element is at the index 0 position. So it is accessed by mentioning the index value in the series. We can use both 0 or the custom index to fetch the value.
Here's the longwinded way, which should be faster if your non-zero values tend to occur near the start of large arrays:
import pandas as pd
df = pd.DataFrame([[0, 0, 0],[0, 10, 0],[4, 0, 0],[1, 2, 3]],
columns=['first', 'second', 'third'])
res = [next(((j, i) for i, j in enumerate(df[col]) if j != 0), (0, 0)) for col in df]
df_res = pd.DataFrame(res, columns=['value', 'position'], index=df.columns)
print(df_res)
value position
first 4 2
second 10 1
third 3 3
You're looking for idxmax
which gives you the first position of the maximum. However, you need to find the max of "not equal to zero"
df.ne(0).idxmax()
first 2
second 1
third 3
dtype: int64
We can couple this with lookup
and assign
df.ne(0).idxmax().to_frame('pos').assign(val=lambda d: df.lookup(d.pos, d.index))
pos val
first 2 4
second 1 10
third 3 3
Same answer packaged slightly differently.
m = df.ne(0).idxmax()
pd.DataFrame(dict(pos=m, val=df.lookup(m, m.index)))
pos val
first 2 4
second 1 10
third 3 3
I will using stack
, index is for row and column number
df[df.eq(df.max(1),0)&df.ne(0)].stack()
Out[252]:
1 second 10.0
2 first 4.0
3 third 3.0
dtype: float64
You can also use Numpy's nonzero
function for this.
positions = [df[col].to_numpy().nonzero()[0][0] for col in df]
df_res = pd.DataFrame({'value': df.to_numpy()[(positions, range(3))],
'position': positions}, index=df.columns)
print(df_res)
value position
first 4 2
second 10 1
third 3 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With