I have two dataframes in Pandas which are being merged together df.A and df.B, df.A is the original, and df.B has the new data I want to bring over. The merge works fine and as expected I get two columns col_x and col_y in the merged df. However, in some rows, the original df.A has values where the other df.B does not. My question is, how can I selectively take the values from col_x and col_y and place them into a new col such as col_z ? Here's what I mean, how can I merge df.A: <pre class="prettyprint"><code>date impressions spend col 1/1/15 100000 3.00 ABC123456 1/2/15 145000 5.00 ABCD00000 1/3/15 300000 15.00 (null) </code></pre> with df.B <pre class="prettyprint"><code>date col 1/1/15 (null) 1/2/15 (null) 1/3/15 DEF123456 </code></pre> To get: <pre class="prettyprint"><code>date impressions spend col_z 1/1/15 100000 3.00 ABC123456 1/2/15 145000 5.00 ABCD00000 1/3/15 300000 15.00 DEF123456 </code></pre> Any help or point in the right direction would be really appreciated! Thanks

IMO the shortest and yet readable solution is something like that: <pre class="prettyprint"><code>df.A.loc[df.A['col'].isna(), 'col'] = df.A.merge(df.B, how='left', on='date')['col_y'] </code></pre> What it basically does is assigning values from merged table column <code>col_y</code> to primary <code>df.A</code> table, for those rows in <code>col</code> column, which are empty (<code>.isna()</code> condition).

Merge Only When Value is Empty/Null in Pandas

Tags:

python

merge

pandas

I have two dataframes in Pandas which are being merged together df.A and df.B, df.A is the original, and df.B has the new data I want to bring over. The merge works fine and as expected I get two columns col_x and col_y in the merged df.

However, in some rows, the original df.A has values where the other df.B does not. My question is, how can I selectively take the values from col_x and col_y and place them into a new col such as col_z ?

Here's what I mean, how can I merge df.A:

date   impressions    spend    col
1/1/15 100000         3.00     ABC123456
1/2/15 145000         5.00     ABCD00000
1/3/15 300000         15.00    (null)

with df.B

date    col
1/1/15  (null)
1/2/15  (null)
1/3/15  DEF123456

To get:

date   impressions    spend    col_z
1/1/15 100000         3.00     ABC123456
1/2/15 145000         5.00     ABCD00000
1/3/15 300000         15.00    DEF123456

Any help or point in the right direction would be really appreciated!

Thanks

572

asked May 18 '15 06:05

Jonathan Kennedy

2 Answers

OK assuming that your (null) values are in fact NaN values and not that string then the following works:

In [10]:
# create the merged df
merged = dfA.merge(dfB, on='date')
merged

Out[10]:
        date  impressions  spend      col_x      col_y
0 2015-01-01       100000      3  ABC123456        NaN
1 2015-01-02       145000      5  ABCD00000        NaN
2 2015-01-03       300000     15        NaN  DEF123456

You can use where to conditionally assign a value from the _x and _y columns:

In [11]:
# now create col_z using where
merged['col_z'] = merged['col_x'].where(merged['col_x'].notnull(), merged['col_y'])
merged

Out[11]:
        date  impressions  spend      col_x      col_y      col_z
0 2015-01-01       100000      3  ABC123456        NaN  ABC123456
1 2015-01-02       145000      5  ABCD00000        NaN  ABCD00000
2 2015-01-03       300000     15        NaN  DEF123456  DEF123456

You can then drop the extraneous columns:

In [13]:

merged = merged.drop(['col_x','col_y'],axis=1)
merged

Out[13]:
        date  impressions  spend      col_z
0 2015-01-01       100000      3  ABC123456
1 2015-01-02       145000      5  ABCD00000
2 2015-01-03       300000     15  DEF123456

177

answered Oct 22 '22 07:10

EdChum

IMO the shortest and yet readable solution is something like that:

df.A.loc[df.A['col'].isna(), 'col'] = df.A.merge(df.B, how='left', on='date')['col_y']

What it basically does is assigning values from merged table column col_y to primary df.A table, for those rows in col column, which are empty (.isna() condition).

answered Oct 22 '22 05:10

Oskar_U

Related questions
                            
                                Generate python bindings, what methods/programs to use [closed]
                            
                                Python - Find text using beautifulSoup then replace in original soup variable
                            
                                how to check which compiler was used to build Python
                            
                                Stopping Supervisor doesn't stop Celery workers
                            
                                What is the use of __kwdefaults__ which is a function object attribute?
                            
                                Sharing a lock between gunicorn workers
                            
                                Setting DataFrame column headers to a MultiIndex
                            
                                Django: list all reverse relations of a model
                            
                                remove italics in latex subscript in matplotlib
                            
                                fixing words with spaces using a dictionary look up in python?
                            
                                How to place minor ticks on symlog scale?
                            
                                Where in flask/gunicorn to initialize application
                            
                                Use cases for property vs. descriptor vs. __getattribute__
                            
                                Get count of related model efficiently in Django
                            
                                How to eliminate the extra minus sign when rounding negative numbers towards zero in numpy?
                            
                                Find out which font matplotlib uses
                            
                                Why does PyMongo throw AutoReconnect?
                            
                                Pandas MultiIndex: Divide all columns by one column
                            
                                Clustering cosine similarity matrix
                            
                                Why does CalibratedClassifierCV underperform a direct classifer?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With