Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using pandas to calculate over-the-month and over-the-year change

Tags:

python

pandas

I can't wrap head around how to do this, but I want to go from this DataFrame:

Date    Value
Jan-15  300
Feb-15  302
Mar-15  303
Apr-15  305
May-15  307
Jun-15  307
Jul-15  305
Aug-15  306
Sep-15  308
Oct-15  310
Nov-15  309
Dec-15  312
Jan-16  315
Feb-16  317
Mar-16  315
Apr-16  315
May-16  312
Jun-16  314
Jul-16  312
Aug-16  313
Sep-16  316
Oct-16  316
Nov-16  316
Dec-16  312

To this one by calculating over-the-month and over-the-year change:

Date    Value  otm  oty
Jan-15  300    na   na
Feb-15  302    2    na
Mar-15  303    1    na
Apr-15  305    2    na
May-15  307    2    na
Jun-15  307    0    na
Jul-15  305    -2   na
Aug-15  306    1    na
Sep-15  308    2    na
Oct-15  310    2    na
Nov-15  309    -1   na
Dec-15  312    3    na
Jan-16  315    3    15
Feb-16  317    2    15
Mar-16  315    -2   12
Apr-16  315    0    10
May-16  312    -3   5
Jun-16  314    2    7
Jul-16  312    -2   7
Aug-16  313    1    7
Sep-16  316    3    8
Oct-16  316    0    6
Nov-16  316    0    7
Dec-16  312    -4   0

So otm is calculated from the value of the field above and oty is calculated from 12 fields above.

like image 237
Ben Avatar asked Jan 05 '23 00:01

Ben


2 Answers

I think you need diff, but is necessary there are not missing any month in index:

df['otm'] = df.Value.diff()
df['oty'] = df.Value.diff(12)
print (df)
      Date  Value  otm   oty
0   Jan-15    300  NaN   NaN
1   Feb-15    302  2.0   NaN
2   Mar-15    303  1.0   NaN
3   Apr-15    305  2.0   NaN
4   May-15    307  2.0   NaN
5   Jun-15    307  0.0   NaN
6   Jul-15    305 -2.0   NaN
7   Aug-15    306  1.0   NaN
8   Sep-15    308  2.0   NaN
9   Oct-15    310  2.0   NaN
10  Nov-15    309 -1.0   NaN
11  Dec-15    312  3.0   NaN
12  Jan-16    315  3.0  15.0
13  Feb-16    317  2.0  15.0
14  Mar-16    315 -2.0  12.0
15  Apr-16    315  0.0  10.0
16  May-16    312 -3.0   5.0
17  Jun-16    314  2.0   7.0
18  Jul-16    312 -2.0   7.0
19  Aug-16    313  1.0   7.0
20  Sep-16    316  3.0   8.0
21  Oct-16    316  0.0   6.0
22  Nov-16    316  0.0   7.0
23  Dec-16    312 -4.0   0.0

If some data are missing it is a bit complicated:

  • convert to to_datetime + to_period
  • set_index + reindex - if are missing first Jan or last Dec values better is set it manually, not by min and max
  • change format of index by strftime
  • reset_index

df['Date'] = pd.to_datetime(df['Date'], format='%b-%y').dt.to_period('M')
df = df.set_index('Date')
df = df.reindex(pd.period_range(df.index.min(), df.index.max(), freq='M'))
df.index = df.index.strftime('%b-%y')
df = df.rename_axis('date').reset_index()

df['otm'] = df.Value.diff()
df['oty'] = df.Value.diff(12)

print (df)
      date  Value  otm   oty
0   Jan-15  300.0  NaN   NaN
1   Feb-15  302.0  2.0   NaN
2   Mar-15    NaN  NaN   NaN
3   Apr-15    NaN  NaN   NaN
4   May-15  307.0  NaN   NaN
5   Jun-15  307.0  0.0   NaN
6   Jul-15  305.0 -2.0   NaN
7   Aug-15  306.0  1.0   NaN
8   Sep-15  308.0  2.0   NaN
9   Oct-15  310.0  2.0   NaN
10  Nov-15  309.0 -1.0   NaN
11  Dec-15  312.0  3.0   NaN
12  Jan-16  315.0  3.0  15.0
13  Feb-16  317.0  2.0  15.0
14  Mar-16  315.0 -2.0   NaN
15  Apr-16  315.0  0.0   NaN
16  May-16  312.0 -3.0   5.0
17  Jun-16  314.0  2.0   7.0
18  Jul-16  312.0 -2.0   7.0
19  Aug-16  313.0  1.0   7.0
20  Sep-16  316.0  3.0   8.0
21  Oct-16  316.0  0.0   6.0
22  Nov-16  316.0  0.0   7.0
23  Dec-16  312.0 -4.0   0.0
like image 127
jezrael Avatar answered Jan 06 '23 15:01

jezrael


More correct solution is to shift by month frequency:

#Create datetime column
df['DateTime'] = pd.to_datetime(df['Date'], format='%b-%y')

#Set it as index
df.set_index('DateTime', inplace=True)

#Then shift by month frequency:
df['otm'] = df['Value'] - df['Value'].shift(1, freq='MS')
df['oty'] = df['Value'] - df['Value'].shift(12, freq='MS')
like image 24
amir.rafieian Avatar answered Jan 06 '23 13:01

amir.rafieian