Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to multiply all hourly values in one pandas dataframe with yearly values in another dataframe?

I have the following hourly dataframe dfA:

Date/Time            Value1    Value2
01.03.2010 00:00:00  60        10
01.03.2010 01:00:00  50        20 
01.03.2010 02:00:00  52        30
01.03.2010 03:00:00  49        40
.
.
.
31.12.2013 23:00:00  77        50

And I have a second dataframe dfB with annual vaules:

Date/Time   Value1    Value2
31.12.2010   1.5        0.9
31.12.2011   1.6        1.1 
31.12.2012   1.7        2.3
31.12.2013   1.3        0.6

I would like to multiply each hourly value in dfA with the factor of the corresponding year in dataframe dfB.

The result should look like this:

Date/Time            Value1    Value2
01.03.2010  00:00:00    90        9
01.03.2010  01:00:00    75       18
01.03.2010  02:00:00    78       27
01.03.2010  03:00:00    73.5     36
.           
.           
.           
31.12.2013  23:00:00    100.1    30

I have been trying with dfC = dfA*dfB[dfA.index.year()] but I am getting the error TypeError: 'numpy.ndarray' object is not callable. Can anyone help me with this?

like image 273
Markus W Avatar asked Mar 13 '23 04:03

Markus W


2 Answers

You can try append to index of df1 df1.index.year, then change index of df2 to years and then use mul:

print df1
                     Value1  Value2
Date/Time                          
2010-01-03 00:00:00      60      10
2010-01-03 01:00:00      50      20
2010-01-03 02:00:00      52      30
2010-01-03 03:00:00      49      40
2013-12-31 23:00:00      77      50

print df2
            Value1  Value2
Date/Time                 
2010-12-31     1.5     0.9
2011-12-31     1.6     1.1
2012-12-31     1.7     2.3
2013-12-31     1.3     0.6

df1 = df1.set_index(df1.index.year, append=True)
df2.index = df2.index.year
print df1
                          Value1  Value2
Date/Time                               
2010-01-03 00:00:00 2010      60      10
2010-01-03 01:00:00 2010      50      20
2010-01-03 02:00:00 2010      52      30
2010-01-03 03:00:00 2010      49      40
2013-12-31 23:00:00 2013      77      50

print df2
      Value1  Value2
2010     1.5     0.9
2011     1.6     1.1
2012     1.7     2.3
2013     1.3     0.6

print df1.mul(df2, level=1).reset_index(drop=True, level=1)
                     Value1  Value2
Date/Time                          
2010-01-03 00:00:00    90.0       9
2010-01-03 01:00:00    75.0      18
2010-01-03 02:00:00    78.0      27
2010-01-03 03:00:00    73.5      36
2013-12-31 23:00:00   100.1      30
like image 113
jezrael Avatar answered Apr 27 '23 21:04

jezrael


The easiest way is probably to upsample the lower frequency series / data frame to the higher frequency, and then multiply.

In [82]: s1 = pd.Series(np.random.randn(31), index=pd.date_range(start='2015-01-01', end='2015-01-31', freq='d'))

In [83]: s2 = pd.Series(np.random.randn(4), index=pd.date_range(start='2015-01-01', end='2015-01-31', freq='W'))

In [84]: s2.resample('d').mean().reindex(s1.index).ffill().bfill() * s1
Out[84]:
2015-01-01    0.361944
2015-01-02    2.806391
2015-01-03    0.741745
2015-01-04    0.855619
2015-01-05   -2.127828
                ...
2015-01-27    0.533919
2015-01-28    0.792278
2015-01-29    1.722754
2015-01-30    0.822032
2015-01-31    0.729741
Freq: D, dtype: float64

We need that additional redindex(s1.index) in this case since the start and end don't align exactly.

like image 39
TomAugspurger Avatar answered Apr 27 '23 19:04

TomAugspurger