I have been struggling on what it seems a simple merge between rows. I have two pandas DataFrames with the following column values
df_a.columns.to_list()
['id','food','color','type','shape']
df_b.columns.to_list()
['id','food','smell','date']
I want to see if there are foods repeated in both DataFrames to merge them in just one
df_total = pd.concat([df_a, df_b], keys=['A', 'B'], ignore_index=False)
df_total = df_total.sort_values(by=['food'],ascending=True);
df_total['food'].value_counts().loc[lambda x : x>=2]
Out[1]
apple 2
cheese 2
According to this, "APPLE" and "CHEESE" are duplicated. When printing the concatenated table we get
id food color type shape smell date
-----------------------------------------------------------------
1 apple red fruit round NaN NaT
1 apple NaN NaN NaN soft 2020-06-05
2 cheese yellow dairy squared NaN NaT
2 cheese NaN NaN NaN soft 2020-06-07
3 lemon green fruit round NaN NaT
Desired output:
id food color type shape smell date
-----------------------------------------------------------------
1 apple red fruit round soft 2020-06-05
2 cheese yellow dairy squared soft 2020-06-07
3 lemon green fruit round NaN NaT
My attempt:
Redefine df_total this time with pd.merge using .reset_index in both DataFrames.
df_total = pd.merge(df_a.reset_index(),df_b.reset_index(), how = 'right/left/outer/inner')
For how I have used the values of "right", "left", "outer", "inner" but it merges them in such way as if I just had deleted one of the rows or with no values at all. How can I get the desired output?
You could take advantage of the first/last capabilities of groupby.
In this case:
df.groupby(['food']).last().reset_index()
Output
1 0 2 3 4 5 6
0 apple 1 red fruit round soft 2020-06-05
1 cheese 2 yellow dairy squared soft 2020-06-07
2 lemon 3 green fruit round NaN NaT
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With