I created a DatetimeIndex from a "date" column:
sales.index = pd.DatetimeIndex(sales["date"])
Now the index looks as follows:
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-04', '2003-01-06', '2003-01-07', '2003-01-08', '2003-01-09', '2003-01-10', '2003-01-11', '2003-01-13', ... '2016-07-22', '2016-07-23', '2016-07-24', '2016-07-25', '2016-07-26', '2016-07-27', '2016-07-28', '2016-07-29', '2016-07-30', '2016-07-31'], dtype='datetime64[ns]', name='date', length=4393, freq=None)
As you see, the freq
attribute is None. I suspect that errors down the road are caused by the missing freq
. However, if I try to set the frequency explicitly:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-148-30857144de81> in <module>() 1 #### DEBUG ----> 2 sales_train = disentangle(df_train) 3 sales_holdout = disentangle(df_holdout) 4 result = sarima_fit_predict(sales_train.loc[5002, 9990]["amount_sold"], sales_holdout.loc[5002, 9990]["amount_sold"]) <ipython-input-147-08b4c4ecdea3> in disentangle(df_train) 2 # transform sales table to disentangle sales time series 3 sales = df_train[["date", "store_id", "article_id", "amount_sold"]] ----> 4 sales.index = pd.DatetimeIndex(sales["date"], freq="d") 5 sales = sales.pivot_table(index=["store_id", "article_id", "date"]) 6 return sales /usr/local/lib/python3.6/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs) 89 else: 90 kwargs[new_arg_name] = new_arg_value ---> 91 return func(*args, **kwargs) 92 return wrapper 93 return _deprecate_kwarg /usr/local/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, ambiguous, dtype, **kwargs) 399 'dates does not conform to passed ' 400 'frequency {1}' --> 401 .format(inferred, freq.freqstr)) 402 403 if freq_infer: ValueError: Inferred frequency None from passed dates does not conform to passed frequency D
So apparently a frequency has been inferred, but is stored neither in the freq
nor inferred_freq
attribute of the DatetimeIndex - both are None. Can someone clear up the confusion?
class pandas. DatetimeIndex [source] Immutable ndarray of datetime64 data, represented internally as int64, and which can be boxed to Timestamp objects that are subclasses of datetime and carry metadata such as frequency information.
To convert the index of a DataFrame to DatetimeIndex , use Pandas' to_datetime(~) method.
To convert the DateTimeIndex to Series, use the DateTimeIndex. to_series() method.
Pandas DatetimeIndex.freq attribute returns the frequency object if it is set in the DatetimeIndex object. If the frequency is not set then it returns None. Example #1: Use DatetimeIndex.freq attribute to find the frequency for the given DatetimeIndex object. Now we want to find the value of frequency for the given DatetimeIndex object.
As you see, the freq attribute is None. I suspect that errors down the road are caused by the missing freq. However, if I try to set the frequency explicitly: So apparently a frequency has been inferred, but is stored neither in the freq nor inferred_freq attribute of the DatetimeIndex - both are None. Can someone clear up the confusion?
Optional datetime-like data to construct index with. One of pandas date offset strings or corresponding objects. The string ‘infer’ can be passed in order to set the frequency of the index as the inferred frequency upon creation. Set the Timezone of the data. Normalize start/end dates to midnight before generating date range.
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas DatetimeIndex.freq attribute returns the frequency object if it is set in the DatetimeIndex object.
You have a couple options here:
pd.infer_freq
pd.tseries.frequencies.to_offset
I suspect that errors down the road are caused by the missing freq.
You are absolutely right. Here's what I use often:
def add_freq(idx, freq=None): """Add a frequency attribute to idx, through inference or directly. Returns a copy. If `freq` is None, it is inferred. """ idx = idx.copy() if freq is None: if idx.freq is None: freq = pd.infer_freq(idx) else: return idx idx.freq = pd.tseries.frequencies.to_offset(freq) if idx.freq is None: raise AttributeError('no discernible frequency found to `idx`. Specify' ' a frequency string with `freq`.') return idx
An example:
idx=pd.to_datetime(['2003-01-02', '2003-01-03', '2003-01-06']) # freq=None print(add_freq(idx)) # inferred DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], dtype='datetime64[ns]', freq='B') print(add_freq(idx, freq='D')) # explicit DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], dtype='datetime64[ns]', freq='D')
Using asfreq
will actually reindex (fill) missing dates, so be careful of that if that's not what you're looking for.
The primary function for changing frequencies is the
asfreq
function. For aDatetimeIndex
, this is basically just a thin, but convenient wrapper aroundreindex
which generates adate_range
and callsreindex
.
It seems to relate to missing dates as 3kt notes. You might be able to "fix" with asfreq('D')
as EdChum suggests but that gives you a continuous index with missing data values. It works fine for some some sample data I made up:
df=pd.DataFrame({ 'x':[1,2,4] }, index=pd.to_datetime(['2003-01-02', '2003-01-03', '2003-01-06']) ) df Out[756]: x 2003-01-02 1 2003-01-03 2 2003-01-06 4 df.index Out[757]: DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], dtype='datetime64[ns]', freq=None)
Note that freq=None
. If you apply asfreq('D')
, this changes to freq='D'
:
df.asfreq('D') Out[758]: x 2003-01-02 1.0 2003-01-03 2.0 2003-01-04 NaN 2003-01-05 NaN 2003-01-06 4.0 df.asfreq('d').index Out[759]: DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-04', '2003-01-05', '2003-01-06'], dtype='datetime64[ns]', freq='D')
More generally, and depending on what exactly you are trying to do, you might want to check out the following for other options like reindex & resample: Add missing dates to pandas dataframe
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With