I have a DataFrame with monthly index. I want to examine whether the time index is continuous on the monthly frequency, and, if possible, spots where it becomes discontinuous e.g. has certain "gap months" between two months that are adjacent in its index.
Example: the following time series data
1964-07-31 100.00
1964-08-31 98.81
1964-09-30 101.21
1964-11-30 101.42
1964-12-31 101.45
1965-03-31 91.49
1965-04-30 90.33
1965-05-31 85.23
1965-06-30 86.10
1965-08-31 84.26
misses 1964/10, 1965/[1,2,7].
Use asfreq
by month for add missing datetimes, filter it to new Series
and if necessary grouping by years with create list of months:
s = s.asfreq('m')
s1 = pd.Series(s[s.isnull()].index)
print (s1)
0 1964-10-31
1 1965-01-31
2 1965-02-28
3 1965-07-31
Name: 0, dtype: datetime64[ns]
out = s1.dt.month.groupby(s1.dt.year).apply(list)
print (out)
0
1964 [10]
1965 [1, 2, 7]
Name: 0, dtype: object
Setup:
s = pd.Series({pd.Timestamp('1964-07-31 00:00:00'): 100.0,
pd.Timestamp('1964-08-31 00:00:00'): 98.81,
pd.Timestamp('1964-09-30 00:00:00'): 101.21,
pd.Timestamp('1964-11-30 00:00:00'): 101.42,
pd.Timestamp('1964-12-31 00:00:00'): 101.45,
pd.Timestamp('1965-03-31 00:00:00'): 91.49,
pd.Timestamp('1965-04-30 00:00:00'): 90.33,
pd.Timestamp('1965-05-31 00:00:00'): 85.23,
pd.Timestamp('1965-06-30 00:00:00'): 86.1,
pd.Timestamp('1965-08-31 00:00:00'): 84.26})
print (s)
1964-07-31 100.00
1964-08-31 98.81
1964-09-30 101.21
1964-11-30 101.42
1964-12-31 101.45
1965-03-31 91.49
1965-04-30 90.33
1965-05-31 85.23
1965-06-30 86.10
1965-08-31 84.26
dtype: float64
EDIT:
If datetimes are not always last day of months:
s = pd.Series({pd.Timestamp('1964-07-31 00:00:00'): 100.0,
pd.Timestamp('1964-08-31 00:00:00'): 98.81,
pd.Timestamp('1964-09-01 00:00:00'): 101.21,
pd.Timestamp('1964-11-02 00:00:00'): 101.42,
pd.Timestamp('1964-12-05 00:00:00'): 101.45,
pd.Timestamp('1965-03-31 00:00:00'): 91.49,
pd.Timestamp('1965-04-30 00:00:00'): 90.33,
pd.Timestamp('1965-05-31 00:00:00'): 85.23,
pd.Timestamp('1965-06-30 00:00:00'): 86.1,
pd.Timestamp('1965-08-31 00:00:00'): 84.26})
print (s)
1964-07-31 100.00
1964-08-31 98.81
1964-09-01 101.21
1964-11-02 101.42
1964-12-05 101.45
1965-03-31 91.49
1965-04-30 90.33
1965-05-31 85.23
1965-06-30 86.10
1965-08-31 84.26
dtype: float64
#convert all months to first day
s.index = s.index.to_period('m').to_timestamp()
#MS is start month frequency
s = s.asfreq('MS')
s1 = pd.Series(s[s.isnull()].index)
print (s1)
0 1964-10-01
1 1965-01-01
2 1965-02-01
3 1965-07-01
dtype: datetime64[ns]
I often do that by calculating the gap between each index value.
times_gaps = df.index - df.index.shift(1)
Then you can plot those:
times_gaps.plot()
If there are gaps, you will quicky see where. If there are no gap, you will see a straight horizontale line.
You can also select gaps times doing:
times_gaps[times_gaps> threshold]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With