I have two Data Frames with identical column names and identical IDs in the first column. With the exception of the ID column, every cell that contains a value in one DataFrame contains NaN in the other. Here's an example of what they look like:
ID Cat1 Cat2 Cat3
1 NaN 75 NaN
2 61 NaN 84
3 NaN NaN NaN
ID Cat1 Cat2 Cat3
1 54 NaN 44
2 NaN 38 NaN
3 49 50 53
I want to merge them into one DataFrame while keeping the same Column Names. So the result would look like this:
ID Cat1 Cat2 Cat3
1 54 75 44
2 61 38 84
3 49 50 53
I tried:
df3 = pd.merge(df1, df2, on='ID', how='outer')
Which gave me a DataFrame containing twice as many columns. How can I merge the values from each DataFrame into one?
You probably want df.update. See the documentation.
df1.update(df2, raise_conflict=True)
In this case, the combine_first function is appropriate. (http://pandas.pydata.org/pandas-docs/version/0.13.1/merging.html)
As the name implies, combine_first takes the first DataFrame and adds to it with values from the second wherever it finds a NaN value in the first.
So:
df3 = df1.combine_first(df2)
produces a new DataFrame, df3, that is essentially just df1 with values from df2 filled in whenever possible.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With