I'm trying to merge two dataframes in pandas
on a common column name (orderid). The resulting dataframe (the merged dataframe) is dropping the orderid from the 2nd data frame. Per the documentation, the 'on' column should be kept unless you explicitly tell it not to.
import pandas as pd
df = pd.DataFrame([[1,'a'], [2, 'b'], [3, 'c']], columns=['orderid', 'ordervalue'])
df['orderid'] = df['orderid'].astype(str)
df2 = pd.DataFrame([[1,200], [2, 300], [3, 400], [4,500]], columns=['orderid', 'ordervalue'])
df2['orderid'] = df2['orderid'].astype(str)
pd.merge(df, df2, on='orderid', how='outer', copy=True, suffixes=('_left', '_right'))
Which outputs this:
| |orderid | ordervalue_left | ordervalue_right |
|------|--------|-----------------|------------------|
| 0 | 1 | a | 200 |
| 1 | 2 | b | 300 |
| 2 | 3 | c | 400 |
| 3 | 4 | | 500 |
What I am trying to create is this:
| | orderid_left | ordervalue_left | orderid_left | ordervalue_right |
|------|--------------|-----------------|--------------|------------------|
| 0 | 1 | a | 1 | 200 |
| 1 | 2 | b | 2 | 300 |
| 2 | 3 | c | 3 | 400 |
| 3 | NaN | NaN | 4 | 500 |
How should I write this?
Rename the orderid
columns so that df
has a column named orderid_left
,
and df2
has a column named orderid_right
:
import pandas as pd
df = pd.DataFrame([[1,'a'], [2, 'b'], [3, 'c']], columns=['orderid', 'ordervalue'])
df['orderid'] = df['orderid'].astype(str)
df2 = pd.DataFrame([[1,200], [2, 300], [3, 400], [4,500]], columns=['orderid', 'ordervalue'])
df2['orderid'] = df2['orderid'].astype(str)
df = df.rename(columns={'orderid':'orderid_left'})
df2 = df2.rename(columns={'orderid':'orderid_right'})
result = pd.merge(df, df2, left_on='orderid_left', right_on='orderid_right',
how='outer', suffixes=('_left', '_right'))
print(result)
yields
orderid_left ordervalue_left orderid_right ordervalue_right
0 1 a 1 200
1 2 b 2 300
2 3 c 3 400
3 NaN NaN 4 500
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With