Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge 2 dataframes with same values in a column

I have 2 dataframes. One is in this form:

df1:
     date      revenue
0  2016-11-17   385.943800
1  2016-11-18  1074.160340
2  2016-11-19  2980.857860
3  2016-11-20  1919.723960
4  2016-11-21   884.279340
5  2016-11-22   869.071070
6  2016-11-23   760.289260
7  2016-11-24  2481.689270
8  2016-11-25  2745.990070
9  2016-11-26  2273.413250
10 2016-11-27  2630.414900

The other one is in this form:

df2:

      CET    MaxTemp  MeanTemp MinTemp  MaxHumidity  MeanHumidity  MinHumidity
0  2016-11-17   11      9        7            100           85             63
1  2016-11-18   9       6        3             93           83             66
2  2016-11-19   8       6        4             93           87             76
3  2016-11-20   10      7        4             93           84             81
4  2016-11-21   14     10        7            100           89             77
5  2016-11-22   13     10        7             93           79             63
6  2016-11-23   11      8        5            100           91             82
7  2016-11-24   9       7        4             93           80             66
8  2016-11-25   7       4        1             87           74             57
9  2016-11-26   7       3       -1            100           88             61
10 2016-11-27  10       7        4            100           81             66   

Both dataframes have more rows and the number of rows will be increasing every day.

I want to combine these 2 dataframes in a way, where every time we see the same date in df1['date'] and df2['CET'], we will add an extra column to df2, which will have the revenue value for this date. So I want to create this:

df2:

      CET    MaxTemp  MeanTemp MinTemp  MaxHumidity  MeanHumidity  MinHumidity  revenue
0  2016-11-17   11      9        7            100           85             63   385.943800
1  2016-11-18   9       6        3             93           83             66  1074.160340
2  2016-11-19   8       6        4             93           87             76  2980.857860
3  2016-11-20   10      7        4             93           84             81  1919.723960
4  2016-11-21   14     10        7            100           89             77   884.279340
5  2016-11-22   13     10        7             93           79             63   869.071070
6  2016-11-23   11      8        5            100           91             82   760.289260
7  2016-11-24   9       7        4             93           80             66  2481.689270
8  2016-11-25   7       4        1             87           74             57  2745.990070
9  2016-11-26   7       3       -1            100           88             61  2273.413250
10 2016-11-27  10       7        4            100           81             66  2630.414900

Can someone help me how to do that?

like image 219
joasa Avatar asked Feb 06 '23 09:02

joasa


1 Answers

I think you can use map:

df2['revenue'] = df2.CET.map(df1.set_index('date')['revenue'])

Also you can convert Series to dict, then it is a bit faster in large df:

df2['revenue'] = df2.CET.map(df1.set_index('date')['revenue'].to_dict())

print (df2)
           CET  MaxTemp  MeanTemp  MinTemp  MaxHumidity  MeanHumidity  \
0   2016-11-17       11         9        7          100            85   
1   2016-11-18        9         6        3           93            83   
2   2016-11-19        8         6        4           93            87   
3   2016-11-20       10         7        4           93            84   
4   2016-11-21       14        10        7          100            89   
5   2016-11-22       13        10        7           93            79   
6   2016-11-23       11         8        5          100            91   
7   2016-11-24        9         7        4           93            80   
8   2016-11-25        7         4        1           87            74   
9   2016-11-26        7         3       -1          100            88   
10  2016-11-27       10         7        4          100            81   

    MinHumidity     revenue  
0            63   385.94380  
1            66  1074.16034  
2            76  2980.85786  
3            81  1919.72396  
4            77   884.27934  
5            63   869.07107  
6            82   760.28926  
7            66  2481.68927  
8            57  2745.99007  
9            61  2273.41325  
10           66  2630.41490  

If all output values are NAN problem is with different dtypes of columns CET and date:

print (df1.date.dtypes)
object
print (df2.CET.dtype)
datetime64[ns]

Solution is convert string column to_datetime:

df1.date = pd.to_datetime(df1.date)
like image 50
jezrael Avatar answered Feb 08 '23 06:02

jezrael