Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert a column of timestamps into periods in pandas

I have a column of timestamps that need to be converted into period ('Month'). e.g.

1985-12-31 00:00:00 to 1985-12

Pandas has a .to_period() function, but:

  • pd.DatetimeIndex.to_period only works on a timestamp index, not column. So you can only have a period index, but not a period column?
  • and that function only work if timestamps are the only index, i.e. not if timestamps are part of a multIndex.

Anyway how do I use this on an arbitary Pandas column, not just a tiemstamp index or period index?

like image 496
user3576212 Avatar asked May 24 '14 02:05

user3576212


People also ask

How do I change the datetime format of a column in pandas?

Use astype() to Change datetime to String Format You can use this if the date is already in the format you want it in string form. The below example returns the date as a string with format %Y/%m/%d . dtype of column ConvertedDate will be object ( string ).


2 Answers

I came across this thread today, and after further digging found that Pandas .15 affords an easier option use .dt, you can avoid the step of creating an index and create the column directly. You can use the following to get the same result:

df[1] = df[0].dt.to_period('M')
like image 92
mattvivier Avatar answered Oct 04 '22 12:10

mattvivier


You're right, you need to do this one DatetimeIndex objects rather than just columns of datetimes. However, this is pretty easy - just wrap it in a DatetimeIndex constructor:

In [11]: df = pd.DataFrame(pd.date_range('2014-01-01', freq='2w', periods=12))

In [12]: df
Out[12]:
            0
0  2014-01-05
1  2014-01-19
2  2014-02-02
3  2014-02-16
4  2014-03-02
5  2014-03-16
6  2014-03-30
7  2014-04-13
8  2014-04-27
9  2014-05-11
10 2014-05-25
11 2014-06-08

In [13]: pd.DatetimeIndex(df[0]).to_period('M')
Out[13]:
<class 'pandas.tseries.period.PeriodIndex'>
freq: M
[2014-01, ..., 2014-06]
length: 12

This is a PeriodIndex, but you can make it a column:

In [14]: df[1] = pd.DatetimeIndex(df[0]).to_period('M')

In [15]: df
Out[15]:
            0        1
0  2014-01-05  2014-01
1  2014-01-19  2014-01
2  2014-02-02  2014-02
3  2014-02-16  2014-02
4  2014-03-02  2014-03
5  2014-03-16  2014-03
6  2014-03-30  2014-03
7  2014-04-13  2014-04
8  2014-04-27  2014-04
9  2014-05-11  2014-05
10 2014-05-25  2014-05
11 2014-06-08  2014-06

You can do a similar trick if the timestamps are part of a MultiIndex by extracting that "column" and passing it to DatetimeIndex as above, e.g. using df.index.get_level_values:
For example:

df[2] = 2
df.set_index([0, 1], inplace=True)
df.index.get_level_values(0)  # returns a DatetimeIndex
like image 28
Andy Hayden Avatar answered Oct 04 '22 12:10

Andy Hayden