I can't wrap head around how to do this, but I want to go from this DataFrame:
Date Value
Jan-15 300
Feb-15 302
Mar-15 303
Apr-15 305
May-15 307
Jun-15 307
Jul-15 305
Aug-15 306
Sep-15 308
Oct-15 310
Nov-15 309
Dec-15 312
Jan-16 315
Feb-16 317
Mar-16 315
Apr-16 315
May-16 312
Jun-16 314
Jul-16 312
Aug-16 313
Sep-16 316
Oct-16 316
Nov-16 316
Dec-16 312
To this one by calculating over-the-month and over-the-year change:
Date Value otm oty
Jan-15 300 na na
Feb-15 302 2 na
Mar-15 303 1 na
Apr-15 305 2 na
May-15 307 2 na
Jun-15 307 0 na
Jul-15 305 -2 na
Aug-15 306 1 na
Sep-15 308 2 na
Oct-15 310 2 na
Nov-15 309 -1 na
Dec-15 312 3 na
Jan-16 315 3 15
Feb-16 317 2 15
Mar-16 315 -2 12
Apr-16 315 0 10
May-16 312 -3 5
Jun-16 314 2 7
Jul-16 312 -2 7
Aug-16 313 1 7
Sep-16 316 3 8
Oct-16 316 0 6
Nov-16 316 0 7
Dec-16 312 -4 0
So otm is calculated from the value of the field above and oty is calculated from 12 fields above.
I think you need diff
, but is necessary there are not missing any month in index:
df['otm'] = df.Value.diff()
df['oty'] = df.Value.diff(12)
print (df)
Date Value otm oty
0 Jan-15 300 NaN NaN
1 Feb-15 302 2.0 NaN
2 Mar-15 303 1.0 NaN
3 Apr-15 305 2.0 NaN
4 May-15 307 2.0 NaN
5 Jun-15 307 0.0 NaN
6 Jul-15 305 -2.0 NaN
7 Aug-15 306 1.0 NaN
8 Sep-15 308 2.0 NaN
9 Oct-15 310 2.0 NaN
10 Nov-15 309 -1.0 NaN
11 Dec-15 312 3.0 NaN
12 Jan-16 315 3.0 15.0
13 Feb-16 317 2.0 15.0
14 Mar-16 315 -2.0 12.0
15 Apr-16 315 0.0 10.0
16 May-16 312 -3.0 5.0
17 Jun-16 314 2.0 7.0
18 Jul-16 312 -2.0 7.0
19 Aug-16 313 1.0 7.0
20 Sep-16 316 3.0 8.0
21 Oct-16 316 0.0 6.0
22 Nov-16 316 0.0 7.0
23 Dec-16 312 -4.0 0.0
If some data are missing it is a bit complicated:
to_datetime
+ to_period
set_index
+ reindex
- if are missing first Jan
or last Dec
values better is set it manually, not by min
and maxstrftime
reset_index
df['Date'] = pd.to_datetime(df['Date'], format='%b-%y').dt.to_period('M')
df = df.set_index('Date')
df = df.reindex(pd.period_range(df.index.min(), df.index.max(), freq='M'))
df.index = df.index.strftime('%b-%y')
df = df.rename_axis('date').reset_index()
df['otm'] = df.Value.diff()
df['oty'] = df.Value.diff(12)
print (df)
date Value otm oty
0 Jan-15 300.0 NaN NaN
1 Feb-15 302.0 2.0 NaN
2 Mar-15 NaN NaN NaN
3 Apr-15 NaN NaN NaN
4 May-15 307.0 NaN NaN
5 Jun-15 307.0 0.0 NaN
6 Jul-15 305.0 -2.0 NaN
7 Aug-15 306.0 1.0 NaN
8 Sep-15 308.0 2.0 NaN
9 Oct-15 310.0 2.0 NaN
10 Nov-15 309.0 -1.0 NaN
11 Dec-15 312.0 3.0 NaN
12 Jan-16 315.0 3.0 15.0
13 Feb-16 317.0 2.0 15.0
14 Mar-16 315.0 -2.0 NaN
15 Apr-16 315.0 0.0 NaN
16 May-16 312.0 -3.0 5.0
17 Jun-16 314.0 2.0 7.0
18 Jul-16 312.0 -2.0 7.0
19 Aug-16 313.0 1.0 7.0
20 Sep-16 316.0 3.0 8.0
21 Oct-16 316.0 0.0 6.0
22 Nov-16 316.0 0.0 7.0
23 Dec-16 312.0 -4.0 0.0
More correct solution is to shift by month frequency:
#Create datetime column
df['DateTime'] = pd.to_datetime(df['Date'], format='%b-%y')
#Set it as index
df.set_index('DateTime', inplace=True)
#Then shift by month frequency:
df['otm'] = df['Value'] - df['Value'].shift(1, freq='MS')
df['oty'] = df['Value'] - df['Value'].shift(12, freq='MS')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With