Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I find: Is the first non-NaN value in each column the maximum for that column in a DataFrame?

For example:

      0     1
0  87.0   NaN
1   NaN  99.0
2   NaN   NaN
3   NaN   NaN
4   NaN  66.0
5   NaN   NaN
6   NaN  77.0
7   NaN   NaN
8   NaN   NaN
9  88.0   NaN

My expected output is: [False, True] since 87 is the first !NaN value but not the maximum in column 0. 99 however is the first !NaN value and is indeed the max in that column.

like image 770
Koray Tugay Avatar asked Dec 05 '22 11:12

Koray Tugay


1 Answers

Option a): Just do groupby with first

(May not be 100% reliable )

df.groupby([1]*len(df)).first()==df.max()
Out[89]: 
       0     1
1  False  True

Option b): bfill

Or using bfill(Fill any NaN value by the backward value in the column , then the first row after bfill is the first not NaN value )

df.bfill().iloc[0]==df.max()
Out[94]: 
0    False
1     True
dtype: bool

Option c): stack

df.stack().reset_index(level=1).drop_duplicates('level_1').set_index('level_1')[0]==df.max()
Out[102]: 
level_1
0    False
1     True
dtype: bool

Option d): idxmax with first_valid_index

df.idxmax()==df.apply(pd.Series.first_valid_index)
Out[105]: 
0    False
1     True
dtype: bool

Option e)(From Pir): idxmax with isna

df.notna().idxmax() == df.idxmax()     
Out[107]: 
0    False
1     True
dtype: bool
like image 164
BENY Avatar answered Jan 04 '23 22:01

BENY