Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas `period_range` gives strange results

Tags:

python

pandas

I want a pandas period range with 25 hours offset, and I saw there are two ways to do this (see here):

The first way is to use freq=25H, which I tried, and gave me the right answer:

import pandas as pd
pd.period_range(start='2016-01-01 10:00', freq = '25H', periods = 10)

and the result is

PeriodIndex(['2016-01-01 10:00', '2016-01-02 11:00', '2016-01-03 12:00',
             '2016-01-04 13:00', '2016-01-05 14:00', '2016-01-06 15:00',
             '2016-01-07 16:00', '2016-01-08 17:00', '2016-01-09 18:00',
             '2016-01-10 19:00'],
            dtype='int64', freq='25H')   

The second way, using freq=1D1H, however, gave me a rather strange result:

pd.period_range(start='2016-01-01 10:00', freq = '1D1H', periods = 10)

and I got

 PeriodIndex(['1971-12-02 01:00', '1971-12-02 02:00', '1971-12-02 03:00',
              '1971-12-02 04:00', '1971-12-02 05:00', '1971-12-02 06:00',
              '1971-12-02 07:00', '1971-12-02 08:00', '1971-12-02 09:00',
              '1971-12-02 10:00'],
            dtype='int64', freq='25H')

So maybe 1D1H is not a valid way to specify frequency? how did 1971 come up? (I also tried to use use 1D1H as frequency for the date_range() method, which did yield the right answer.)

pd.date_range('2016-01-01 10:00', freq = '1D1H', periods = 10)
DatetimeIndex(['2016-01-01 10:00:00', '2016-01-02 11:00:00',
               '2016-01-03 12:00:00', '2016-01-04 13:00:00',
               '2016-01-05 14:00:00', '2016-01-06 15:00:00',
               '2016-01-07 16:00:00', '2016-01-08 17:00:00',
               '2016-01-09 18:00:00', '2016-01-10 19:00:00'],
              dtype='datetime64[ns]', freq='25H')

EDIT: it appears that with period_range(), though freq=1D1H doesn't work, freq=1H1D does. The reason is still unknown.

EDIT2: this has been identified as a bug, see the answer below.

like image 845
Olivier Ma Avatar asked Jul 31 '16 06:07

Olivier Ma


People also ask

What is period range in pandas pandas?

pandas.period_range () is one of the general functions in Pandas which is used to return a fixed frequency PeriodIndex, with day (calendar) as the default frequency. Syntax: pandas.to_numeric (arg, errors=’raise’, downcast=None)

What is period_range () method in pandas in Python?

Python | pandas.period_range () method. Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. pandas.period_range () is one of the general functions in Pandas which is used ...

How to concatenate pandas objects along a particular axis?

The period_range () function is used to concatenate pandas objects along a particular axis with optional set logic along the other axes. pandas.period_range (start=None, end=None, periods=None, freq=None, name=None) Number of periods to generate. Frequency alias. By default the freq is taken from start or end if those are Period objects.

What is the use of start and end in period_range?

If start or end are Period objects, they will be used as anchor endpoints for a PeriodIndex with frequency matching that of the period_range constructor.


1 Answers

The bug has already been identified and reported on GitHub.

EDIT: A fix has been merged and will be included in v0.19.

like image 185
A. Garcia-Raboso Avatar answered Oct 26 '22 12:10

A. Garcia-Raboso