I'm trying to get the first non null value from multiple pandas series in a dataframe.
df = pd.DataFrame({'a':[2, np.nan, np.nan, np.nan],
'b':[np.nan, 5, np.nan, np.nan],
'c':[np.nan, 55, 13, 14],
'd':[np.nan, np.nan, np.nan, 4],
'e':[12, np.nan, np.nan, 22],
})
a b c d e
0 2.0 NaN NaN NaN 12.0
1 NaN 5.0 55.0 NaN NaN
2 NaN NaN 13.0 NaN NaN
3 NaN NaN 14.0 4.0 22.0
in this df
I want to create a new column 'f'
, and set it equal to 'a'
if a is not null, 'b'
if b is not null etc. down to e.
I could do a bunch of np.where statements which is inefficient.
df['f'] = np.where(df.a.notnull(), df.a,
np.where(df.b.notnull(), df.b,
etc.))
I looked into doing df.a or df.b or df.c
etc.
result should look like:
a b c d e f
0 2.0 NaN NaN NaN 12.0 2
1 NaN 5.0 55.0 NaN NaN 5
2 NaN NaN 13.0 NaN NaN 13
3 NaN NaN 14.0 4.0 22.0 14
null is often defined to be 0 in those languages, but null in Python is different. Python uses the keyword None to define null objects and variables. While None does serve some of the same purposes as null in other languages, it’s another beast entirely. As the null in Python, None is not defined to be 0 or any other value.
In Python, not null rows and columns mean the rows and columns which have Nan values, especially in the Pandas library. To display not null rows and columns in a python data frame we are going to use different methods as dropna (), notnull (), loc [].
Python | Pandas isnull () and notnull () Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. While making a Data Frame from a csv file, many blank columns are imported as null value...
Python uses the keyword None to define null objects and variables. While None does serve some of the same purposes as null in other languages, it’s another beast entirely. As the null in Python, None is not defined to be 0 or any other value.
One solution
df.groupby(['f']*df.shape[1], axis=1).first()
Out[385]:
f
0 2.0
1 5.0
2 13.0
3 14.0
The orther
df.bfill(1)['a']
Out[388]:
0 2.0
1 5.0
2 13.0
3 14.0
Name: a, dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With