Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas.DatetimeIndex frequency is None and can't be set

I created a DatetimeIndex from a "date" column:

sales.index = pd.DatetimeIndex(sales["date"]) 

Now the index looks as follows:

DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-04', '2003-01-06',                    '2003-01-07', '2003-01-08', '2003-01-09', '2003-01-10',                    '2003-01-11', '2003-01-13',                    ...                    '2016-07-22', '2016-07-23', '2016-07-24', '2016-07-25',                    '2016-07-26', '2016-07-27', '2016-07-28', '2016-07-29',                    '2016-07-30', '2016-07-31'],                   dtype='datetime64[ns]', name='date', length=4393, freq=None) 

As you see, the freq attribute is None. I suspect that errors down the road are caused by the missing freq. However, if I try to set the frequency explicitly:

--------------------------------------------------------------------------- ValueError                                Traceback (most recent call last) <ipython-input-148-30857144de81> in <module>()       1 #### DEBUG ----> 2 sales_train = disentangle(df_train)       3 sales_holdout = disentangle(df_holdout)       4 result = sarima_fit_predict(sales_train.loc[5002, 9990]["amount_sold"], sales_holdout.loc[5002, 9990]["amount_sold"])  <ipython-input-147-08b4c4ecdea3> in disentangle(df_train)       2     # transform sales table to disentangle sales time series       3     sales = df_train[["date", "store_id", "article_id", "amount_sold"]] ----> 4     sales.index = pd.DatetimeIndex(sales["date"], freq="d")       5     sales = sales.pivot_table(index=["store_id", "article_id", "date"])       6     return sales  /usr/local/lib/python3.6/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)      89                 else:      90                     kwargs[new_arg_name] = new_arg_value ---> 91             return func(*args, **kwargs)      92         return wrapper      93     return _deprecate_kwarg  /usr/local/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, ambiguous, dtype, **kwargs)     399                                          'dates does not conform to passed '     400                                          'frequency {1}' --> 401                                          .format(inferred, freq.freqstr))     402      403         if freq_infer:  ValueError: Inferred frequency None from passed dates does not conform to passed frequency D 

So apparently a frequency has been inferred, but is stored neither in the freq nor inferred_freq attribute of the DatetimeIndex - both are None. Can someone clear up the confusion?

like image 377
clstaudt Avatar asked Sep 14 '17 11:09

clstaudt


People also ask

What is DatetimeIndex pandas?

class pandas. DatetimeIndex [source] Immutable ndarray of datetime64 data, represented internally as int64, and which can be boxed to Timestamp objects that are subclasses of datetime and carry metadata such as frequency information.

How do I convert DatetimeIndex to Python?

To convert the index of a DataFrame to DatetimeIndex , use Pandas' to_datetime(~) method.

How do I convert DatetimeIndex to series?

To convert the DateTimeIndex to Series, use the DateTimeIndex. to_series() method.

How to find the frequency of pandas datetimeindex object?

Pandas DatetimeIndex.freq attribute returns the frequency object if it is set in the DatetimeIndex object. If the frequency is not set then it returns None. Example #1: Use DatetimeIndex.freq attribute to find the frequency for the given DatetimeIndex object. Now we want to find the value of frequency for the given DatetimeIndex object.

Is the freq attribute of the datetimeindex none or none?

As you see, the freq attribute is None. I suspect that errors down the road are caused by the missing freq. However, if I try to set the frequency explicitly: So apparently a frequency has been inferred, but is stored neither in the freq nor inferred_freq attribute of the DatetimeIndex - both are None. Can someone clear up the confusion?

How do you construct a date index in pandas?

Optional datetime-like data to construct index with. One of pandas date offset strings or corresponding objects. The string ‘infer’ can be passed in order to set the frequency of the index as the inferred frequency upon creation. Set the Timezone of the data. Normalize start/end dates to midnight before generating date range.

How do you analyze data in Python using PANDAS?

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas DatetimeIndex.freq attribute returns the frequency object if it is set in the DatetimeIndex object.


2 Answers

You have a couple options here:

  • pd.infer_freq
  • pd.tseries.frequencies.to_offset

I suspect that errors down the road are caused by the missing freq.

You are absolutely right. Here's what I use often:

def add_freq(idx, freq=None):     """Add a frequency attribute to idx, through inference or directly.      Returns a copy.  If `freq` is None, it is inferred.     """      idx = idx.copy()     if freq is None:         if idx.freq is None:             freq = pd.infer_freq(idx)         else:             return idx     idx.freq = pd.tseries.frequencies.to_offset(freq)     if idx.freq is None:         raise AttributeError('no discernible frequency found to `idx`.  Specify'                              ' a frequency string with `freq`.')     return idx 

An example:

idx=pd.to_datetime(['2003-01-02', '2003-01-03', '2003-01-06'])  # freq=None  print(add_freq(idx))  # inferred DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], dtype='datetime64[ns]', freq='B')  print(add_freq(idx, freq='D'))  # explicit DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], dtype='datetime64[ns]', freq='D') 

Using asfreq will actually reindex (fill) missing dates, so be careful of that if that's not what you're looking for.

The primary function for changing frequencies is the asfreq function. For a DatetimeIndex, this is basically just a thin, but convenient wrapper around reindex which generates a date_range and calls reindex.

like image 145
Brad Solomon Avatar answered Sep 20 '22 12:09

Brad Solomon


It seems to relate to missing dates as 3kt notes. You might be able to "fix" with asfreq('D') as EdChum suggests but that gives you a continuous index with missing data values. It works fine for some some sample data I made up:

df=pd.DataFrame({ 'x':[1,2,4] },     index=pd.to_datetime(['2003-01-02', '2003-01-03', '2003-01-06']) )  df Out[756]:              x 2003-01-02  1 2003-01-03  2 2003-01-06  4  df.index Out[757]: DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'],            dtype='datetime64[ns]', freq=None) 

Note that freq=None. If you apply asfreq('D'), this changes to freq='D':

df.asfreq('D') Out[758]:                x 2003-01-02  1.0 2003-01-03  2.0 2003-01-04  NaN 2003-01-05  NaN 2003-01-06  4.0  df.asfreq('d').index Out[759]:  DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-04', '2003-01-05',                '2003-01-06'],               dtype='datetime64[ns]', freq='D') 

More generally, and depending on what exactly you are trying to do, you might want to check out the following for other options like reindex & resample: Add missing dates to pandas dataframe

like image 40
JohnE Avatar answered Sep 18 '22 12:09

JohnE