I have a data frame like <pre class="prettyprint"><code>df = pd.DataFrame({"A":[1,2,np.nan],"B":[np.nan,10,np.nan], "C":[5,10,7]}) A B C 0 1.0 NaN 5 1 2.0 10.0 10 2 NaN NaN 7 </code></pre> I want to add a new column 'D'. Expected output is <pre class="prettyprint"><code> A B C D 0 1.0 NaN 5 1.0 1 2.0 10.0 10 2.0 2 NaN NaN 7 7.0 </code></pre> Thanks in advance!

Another way is to explicitly fill column D with A,B,C in that order. <pre class="prettyprint"><code>df['D'] = np.nan df['D'] = df.D.fillna(df.A).fillna(df.B).fillna(df.C) </code></pre>

Another approach is to use the <code>combine_first</code> method of a <code>pd.Series</code>. Using your example <code>df</code>, <pre class="prettyprint lang-py prettyprint-override"><code>>>> import pandas as pd >>> import numpy as np >>> df = pd.DataFrame({"A":[1,2,np.nan],"B":[np.nan,10,np.nan], "C":[5,10,7]}) >>> df A B C 0 1.0 NaN 5 1 2.0 10.0 10 2 NaN NaN 7 </code></pre> we have <pre class="prettyprint lang-py prettyprint-override"><code>>>> df.A.combine_first(df.B).combine_first(df.C) 0 1.0 1 2.0 2 7.0 </code></pre> We can use <code>reduce</code> to abstract this pattern to work with an arbitrary number of columns. <pre class="prettyprint lang-py prettyprint-override"><code>>>> from functools import reduce >>> cols = [df[c] for c in df.columns] >>> reduce(lambda acc, col: acc.combine_first(col), cols) 0 1.0 1 2.0 2 7.0 Name: A, dtype: float64 </code></pre> Let's put this all together in a function. <pre class="prettyprint lang-py prettyprint-override"><code>>>> def coalesce(*args): ... return reduce(lambda acc, col: acc.combine_first(col), args) ... >>> coalesce(*cols) 0 1.0 1 2.0 2 7.0 Name: A, dtype: float64 </code></pre>

I think you need <code>bfill</code> with selecting first column by <code>iloc</code>: <pre class="prettyprint"><code>df['D'] = df.bfill(axis=1).iloc[:,0] print (df) A B C D 0 1.0 NaN 5 1.0 1 2.0 10.0 10 2.0 2 NaN NaN 7 7.0 </code></pre> same as: <pre class="prettyprint"><code>df['D'] = df.fillna(method='bfill',axis=1).iloc[:,0] print (df) A B C D 0 1.0 NaN 5 1.0 1 2.0 10.0 10 2.0 2 NaN NaN 7 7.0 </code></pre>

How to implement sql coalesce in pandas

Tags:

python

pandas

I have a data frame like

df = pd.DataFrame({"A":[1,2,np.nan],"B":[np.nan,10,np.nan], "C":[5,10,7]})
     A     B   C
0  1.0   NaN   5
1  2.0  10.0  10
2  NaN   NaN   7

I want to add a new column 'D'. Expected output is

     A     B   C    D
0  1.0   NaN   5    1.0
1  2.0  10.0  10    2.0
2  NaN   NaN   7    7.0

Thanks in advance!

699

asked Apr 03 '17 06:04

Anoop

Video Answer

4 Answers

Another way is to explicitly fill column D with A,B,C in that order.

df['D'] = np.nan
df['D'] = df.D.fillna(df.A).fillna(df.B).fillna(df.C)

102

answered Oct 20 '22 23:10

philshem

Another approach is to use the combine_first method of a pd.Series. Using your example df,

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({"A":[1,2,np.nan],"B":[np.nan,10,np.nan], "C":[5,10,7]})
>>> df
     A     B   C
0  1.0   NaN   5
1  2.0  10.0  10
2  NaN   NaN   7

we have

>>> df.A.combine_first(df.B).combine_first(df.C)
0    1.0
1    2.0
2    7.0

We can use reduce to abstract this pattern to work with an arbitrary number of columns.

>>> from functools import reduce
>>> cols = [df[c] for c in df.columns]
>>> reduce(lambda acc, col: acc.combine_first(col), cols)
0    1.0
1    2.0
2    7.0
Name: A, dtype: float64

Let's put this all together in a function.

>>> def coalesce(*args):
...     return reduce(lambda acc, col: acc.combine_first(col), args)
...
>>> coalesce(*cols)
0    1.0
1    2.0
2    7.0
Name: A, dtype: float64

answered Oct 20 '22 23:10

yardsale8

I think you need bfill with selecting first column by iloc:

df['D'] = df.bfill(axis=1).iloc[:,0]
print (df)
     A     B   C    D
0  1.0   NaN   5  1.0
1  2.0  10.0  10  2.0
2  NaN   NaN   7  7.0

same as:

df['D'] = df.fillna(method='bfill',axis=1).iloc[:,0]
print (df)
     A     B   C    D
0  1.0   NaN   5  1.0
1  2.0  10.0  10  2.0
2  NaN   NaN   7  7.0

answered Oct 20 '22 23:10

jezrael

option 1
pandas

df.assign(D=df.lookup(df.index, df.isnull().idxmin(1)))

     A     B   C    D
0  1.0   NaN   5  1.0
1  2.0  10.0  10  2.0
2  NaN   NaN   7  7.0

option 2
numpy

v = df.values
j = np.isnan(v).argmin(1)
df.assign(D=v[np.arange(len(v)), j])

     A     B   C    D
0  1.0   NaN   5  1.0
1  2.0  10.0  10  2.0
2  NaN   NaN   7  7.0

naive time test
over given data

enter image description here

over larger data

enter image description here

answered Oct 21 '22 01:10

piRSquared

Related questions
                            
                                How to transform a tuple to a string of values without comma and parentheses
                            
                                Using NumPy in Visual Studio
                            
                                printing UTF-8 in Python 3 using Sublime Text 3
                            
                                'pip install MySQL-python' fails with 'IndexError'
                            
                                How to add python 3.6 kernel alongside 3.5 on jupyter
                            
                                python add a new div every 3rd iteration
                            
                                Dynamically evaluating simple boolean logic in Python
                            
                                Calculating dawn and sunset times using PyEphem
                            
                                Run python file -- what function is main?
                            
                                Generating an ascending list of numbers of arbitrary length in python
                            
                                Python: Difference between != and "is not"
                            
                                List assignment with [:]
                            
                                How can I print all arguments passed to a python script?
                            
                                How to upload an image with python-tornado from an HTML form?
                            
                                How can I import a Python library located in the current working directory? [duplicate]
                            
                                sort mongodb documents by timestamp (in desc order)
                            
                                Python 3.4 :ImportError: no module named win32api
                            
                                Django - How to use decorator in class-based view methods?
                            
                                Caffe: Reading LMDB from Python
                            
                                How to find out where the Python include directory is?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to implement sql coalesce in pandas

Tags:

python

pandas

Anoop

People also ask

Video Answer

4 Answers

philshem

yardsale8

jezrael

piRSquared

Recent Activity

Donate For Us