Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting month to quarter in Pandas dataframe

Tags:

python

pandas

I have a column in my data frame denoting month (in the form yyyy-mm). I want to convert that to quarter using pd.Period. I tried using apply function in below form but it's running too slow. Is there a better way to do this? I am using :

hp2['Qtr'] = hp2.apply(lambda x: pd.Period(x['Mth'],'Q'),axis=1)
like image 856
Pranav Kansara Avatar asked Nov 01 '16 21:11

Pranav Kansara


3 Answers

I would use to_datetime() method in a "vectorized" manner:

In [76]: x
Out[76]:
     Month
0  2016-11
1  2011-01
2  2015-07
3  2012-09

In [77]: x['Qtr'] = pd.to_datetime(x.Month).dt.quarter

In [78]: x
Out[78]:
     Month  Qtr
0  2016-11    4
1  2011-01    1
2  2015-07    3
3  2012-09    3

Or if you want to have it in 2016Q4 format (as @root mentioned), using PeriodIndex():

In [114]: x['Qtr'] = pd.PeriodIndex(pd.to_datetime(x.Mth), freq='Q')

In [115]: x
Out[115]:
       Mth    Qtr
0  2016-11 2016Q4
1  2011-01 2011Q1
2  2015-07 2015Q3
3  2012-09 2012Q3
like image 185
MaxU - stop WAR against UA Avatar answered Oct 20 '22 03:10

MaxU - stop WAR against UA


Since you don't need the whole row, is it faster if you map the values from the column alone?

hp2['Qtr'] = hp2['Mth'].map(lambda x: pd.Period(x,'Q'))
like image 25
neuromusic Avatar answered Oct 20 '22 04:10

neuromusic


Same idea as @MaxU but using astype:

hp2['Qtr'] = pd.to_datetime(hp2['Mth'].values, format='%Y-%m').astype('period[Q]')

The resulting output:

        Mth    Qtr
0   2014-01 2014Q1
1   2017-02 2017Q1
2   2016-03 2016Q1
3   2017-04 2017Q2
4   2016-05 2016Q2
5   2016-06 2016Q2
6   2017-07 2017Q3
7   2016-08 2016Q3
8   2017-09 2017Q3
9   2015-10 2015Q4
10  2017-11 2017Q4
11  2015-12 2015Q4

Timings

Using the following setup to produce a large sample dataset:

n = 10**5
yrs = np.random.choice(range(2010, 2021), n)
mths = np.random.choice(range(1, 13), n)
df = pd.DataFrame({'Mth': ['{0}-{1:02d}'.format(*p) for p in zip(yrs, mths)]})

I get the following timings:

%timeit pd.to_datetime(df['Mth'].values, format='%Y-%m').astype('period[Q]')
10 loops, best of 3: 33.4 ms per loop

%timeit pd.PeriodIndex(pd.to_datetime(df.Mth), freq='Q')
1 loop, best of 3: 2.68 s per loop

%timeit df['Mth'].map(lambda x: pd.Period(x,'Q'))
1 loop, best of 3: 6.26 s per loop

%timeit df.apply(lambda x: pd.Period(x['Mth'],'Q'),axis=1)
1 loop, best of 3: 9.49 s per loop
like image 31
root Avatar answered Oct 20 '22 03:10

root