I have a data frame like
df = pd.DataFrame({"A":[1,2,np.nan],"B":[np.nan,10,np.nan], "C":[5,10,7]})
     A     B   C
0  1.0   NaN   5
1  2.0  10.0  10
2  NaN   NaN   7 
I want to add a new column 'D'. Expected output is
     A     B   C    D
0  1.0   NaN   5    1.0
1  2.0  10.0  10    2.0
2  NaN   NaN   7    7.0
Thanks in advance!
Coalesce. This function comes in handy when there are one or more possible values that could be assigned to a variable or used in a given situation and there is a known preference for which value among the options should be selected for use if it's available.
The coalesce in MySQL can be used to return first not null value. If there are multiple columns, and all columns have NULL value then it returns NULL otherwise it will return first not null value. The syntax is as follows. SELECT COALESCE(yourColumnName1,yourColumnName2,yourColumnName3,.......
By using isnull(). values. any() method you can check if a pandas DataFrame contains NaN / None values in any cell (all rows & columns ). This method returns True if it finds NaN/None on any cell of a DataFrame, returns False when not found.
Another way is to explicitly fill column D with A,B,C in that order.
df['D'] = np.nan
df['D'] = df.D.fillna(df.A).fillna(df.B).fillna(df.C)
                        Another approach is to use the combine_first method of a pd.Series.  Using your example df,
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({"A":[1,2,np.nan],"B":[np.nan,10,np.nan], "C":[5,10,7]})
>>> df
     A     B   C
0  1.0   NaN   5
1  2.0  10.0  10
2  NaN   NaN   7
we have
>>> df.A.combine_first(df.B).combine_first(df.C)
0    1.0
1    2.0
2    7.0
We can use reduce to abstract this pattern to work with an arbitrary number of columns.
>>> from functools import reduce
>>> cols = [df[c] for c in df.columns]
>>> reduce(lambda acc, col: acc.combine_first(col), cols)
0    1.0
1    2.0
2    7.0
Name: A, dtype: float64
Let's put this all together in a function.
>>> def coalesce(*args):
...     return reduce(lambda acc, col: acc.combine_first(col), args)
...
>>> coalesce(*cols)
0    1.0
1    2.0
2    7.0
Name: A, dtype: float64
                        I think you need bfill with selecting first column by iloc:
df['D'] = df.bfill(axis=1).iloc[:,0]
print (df)
     A     B   C    D
0  1.0   NaN   5  1.0
1  2.0  10.0  10  2.0
2  NaN   NaN   7  7.0
same as:
df['D'] = df.fillna(method='bfill',axis=1).iloc[:,0]
print (df)
     A     B   C    D
0  1.0   NaN   5  1.0
1  2.0  10.0  10  2.0
2  NaN   NaN   7  7.0
                        option 1pandas  
df.assign(D=df.lookup(df.index, df.isnull().idxmin(1)))
     A     B   C    D
0  1.0   NaN   5  1.0
1  2.0  10.0  10  2.0
2  NaN   NaN   7  7.0
option 2numpy  
v = df.values
j = np.isnan(v).argmin(1)
df.assign(D=v[np.arange(len(v)), j])
     A     B   C    D
0  1.0   NaN   5  1.0
1  2.0  10.0  10  2.0
2  NaN   NaN   7  7.0
naive time test
over given data  

over larger data

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With