I have two dataframes in Pandas which are being merged together df.A and df.B, df.A is the original, and df.B has the new data I want to bring over. The merge works fine and as expected I get two columns col_x and col_y in the merged df.
However, in some rows, the original df.A has values where the other df.B does not. My question is, how can I selectively take the values from col_x and col_y and place them into a new col such as col_z ?
Here's what I mean, how can I merge df.A:
date impressions spend col
1/1/15 100000 3.00 ABC123456
1/2/15 145000 5.00 ABCD00000
1/3/15 300000 15.00 (null)
with df.B
date col
1/1/15 (null)
1/2/15 (null)
1/3/15 DEF123456
To get:
date impressions spend col_z
1/1/15 100000 3.00 ABC123456
1/2/15 145000 5.00 ABCD00000
1/3/15 300000 15.00 DEF123456
Any help or point in the right direction would be really appreciated!
Thanks
Within pandas, a missing value is denoted by NaN . In most cases, the terms missing and null are interchangeable, but to abide by the standards of pandas, we'll continue using missing throughout this tutorial.
Pandas DataFrame isnull() Method The isnull() method returns a DataFrame object where all the values are replaced with a Boolean value True for NULL values, and otherwise False.
OK assuming that your (null) values are in fact NaN values and not that string then the following works:
In [10]:
# create the merged df
merged = dfA.merge(dfB, on='date')
merged
Out[10]:
date impressions spend col_x col_y
0 2015-01-01 100000 3 ABC123456 NaN
1 2015-01-02 145000 5 ABCD00000 NaN
2 2015-01-03 300000 15 NaN DEF123456
You can use where
to conditionally assign a value from the _x and _y columns:
In [11]:
# now create col_z using where
merged['col_z'] = merged['col_x'].where(merged['col_x'].notnull(), merged['col_y'])
merged
Out[11]:
date impressions spend col_x col_y col_z
0 2015-01-01 100000 3 ABC123456 NaN ABC123456
1 2015-01-02 145000 5 ABCD00000 NaN ABCD00000
2 2015-01-03 300000 15 NaN DEF123456 DEF123456
You can then drop
the extraneous columns:
In [13]:
merged = merged.drop(['col_x','col_y'],axis=1)
merged
Out[13]:
date impressions spend col_z
0 2015-01-01 100000 3 ABC123456
1 2015-01-02 145000 5 ABCD00000
2 2015-01-03 300000 15 DEF123456
IMO the shortest and yet readable solution is something like that:
df.A.loc[df.A['col'].isna(), 'col'] = df.A.merge(df.B, how='left', on='date')['col_y']
What it basically does is assigning values from merged table column col_y
to primary df.A
table, for those rows in col
column, which are empty (.isna()
condition).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With