Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Add columns to DataFrame based off existing column

I have 2 pandas DataFrames that I am manipulating in Python, and I would like to combine them based on a certain column (col_1):

#df1
    col_1   col_2          
0       3       7            
1       4       8            
2       5       5    

#df2
  col_1    col_3 
0     4       78
1     5       89

and the output should look like this:

    col_1   col_2   col_3
0   3       7       0
1   4       8       78
2   5       5       89  

I'm not sure about how to go about this using 'best practices'.

Thank you.

like image 850
Mike H Avatar asked Dec 14 '22 17:12

Mike H


2 Answers

Suppose you have:

df1=pd.DataFrame({'col_1': {0: 3, 1: 4, 2: 5}, 'col_2': {0: 7, 1: 8, 2: 5}})
df2=pd.DataFrame({'col_1': {0: 4, 1: 5}, 'col_3': {0: 78, 1: 89}})

You can use merge:

pd.merge(df1,df2,on='col_1',how='left').fillna(0)
Out[22]: 
   col_1  col_2  col_3
0      3      7    0.0
1      4      8   78.0
2      5      5   89.0

If you need col_3 as int (thanks to piRSquared for his suggestion):

pd.merge(df1,df2,on='col_1',how='left').fillna(0, downcast='infer')

Out[25]: 
   col_1  col_2  col_3
0      3      7      0
1      4      8     78
2      5      5     89
like image 62
Allen Avatar answered Jan 25 '23 05:01

Allen


You can use join after using set_index.

df1.join(df2.set_index('col_1'), on='col_1').fillna(0, downcast='infer')

   col_1  col_2  col_3
0      3      7      0
1      4      8     78
2      5      5     89
like image 27
piRSquared Avatar answered Jan 25 '23 05:01

piRSquared