Convert a column of timestamps into periods in pandas

I have a column of timestamps that need to be converted into period ('Month'). e.g.

1985-12-31 00:00:00 to 1985-12

Pandas has a .to_period() function, but:

  • pd.DatetimeIndex.to_period only works on a timestamp index, not column. So you can only have a period index, but not a period column?
  • and that function only work if timestamps are the only index, i.e. not if timestamps are part of a multIndex.

Anyway how do I use this on an arbitary Pandas column, not just a tiemstamp index or period index?

2 Answers

I came across this thread today, and after further digging found that Pandas .15 affords an easier option use .dt, you can avoid the step of creating an index and create the column directly. You can use the following to get the same result:

df[1] = df[0].dt.to_period('M')
You're right, you need to do this one DatetimeIndex objects rather than just columns of datetimes. However, this is pretty easy - just wrap it in a DatetimeIndex constructor:

In [11]: df = pd.DataFrame(pd.date_range('2014-01-01', freq='2w', periods=12))

In [12]: df
0  2014-01-05
1  2014-01-19
2  2014-02-02
3  2014-02-16
4  2014-03-02
5  2014-03-16
6  2014-03-30
7  2014-04-13
8  2014-04-27
9  2014-05-11
10 2014-05-25
11 2014-06-08

In [13]: pd.DatetimeIndex(df[0]).to_period('M')
<class 'pandas.tseries.period.PeriodIndex'>
freq: M
[2014-01, ..., 2014-06]
length: 12

This is a PeriodIndex, but you can make it a column:

In [14]: df[1] = pd.DatetimeIndex(df[0]).to_period('M')

In [15]: df
            0        1
0  2014-01-05  2014-01
1  2014-01-19  2014-01
2  2014-02-02  2014-02
3  2014-02-16  2014-02
4  2014-03-02  2014-03
5  2014-03-16  2014-03
6  2014-03-30  2014-03
7  2014-04-13  2014-04
8  2014-04-27  2014-04
9  2014-05-11  2014-05
10 2014-05-25  2014-05
11 2014-06-08  2014-06

You can do a similar trick if the timestamps are part of a MultiIndex by extracting that "column" and passing it to DatetimeIndex as above, e.g. using df.index.get_level_values:
For example:

df[2] = 2
df.set_index([0, 1], inplace=True)
df.index.get_level_values(0)  # returns a DatetimeIndex
