I have 2 pandas DataFrames that I am manipulating in Python, and I would like to combine them based on a certain column (col_1):
#df1
    col_1   col_2          
0       3       7            
1       4       8            
2       5       5    
#df2
  col_1    col_3 
0     4       78
1     5       89
and the output should look like this:
    col_1   col_2   col_3
0   3       7       0
1   4       8       78
2   5       5       89  
I'm not sure about how to go about this using 'best practices'.
Thank you.
Suppose you have:
df1=pd.DataFrame({'col_1': {0: 3, 1: 4, 2: 5}, 'col_2': {0: 7, 1: 8, 2: 5}})
df2=pd.DataFrame({'col_1': {0: 4, 1: 5}, 'col_3': {0: 78, 1: 89}})
You can use merge:
pd.merge(df1,df2,on='col_1',how='left').fillna(0)
Out[22]: 
   col_1  col_2  col_3
0      3      7    0.0
1      4      8   78.0
2      5      5   89.0
If you need col_3 as int (thanks to piRSquared for his suggestion):
pd.merge(df1,df2,on='col_1',how='left').fillna(0, downcast='infer')
Out[25]: 
   col_1  col_2  col_3
0      3      7      0
1      4      8     78
2      5      5     89
                        You can use join after using set_index.
df1.join(df2.set_index('col_1'), on='col_1').fillna(0, downcast='infer')
   col_1  col_2  col_3
0      3      7      0
1      4      8     78
2      5      5     89
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With