Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting just Month and Year separately from Pandas Datetime column

I have a Dataframe, df, with the following column:

df['ArrivalDate'] = ... 936   2012-12-31 938   2012-12-29 965   2012-12-31 966   2012-12-31 967   2012-12-31 968   2012-12-31 969   2012-12-31 970   2012-12-29 971   2012-12-31 972   2012-12-29 973   2012-12-29 ... 

The elements of the column are pandas.tslib.Timestamp.

I want to just include the year and month. I thought there would be simple way to do it, but I can't figure it out.

Here's what I've tried:

df['ArrivalDate'].resample('M', how = 'mean') 

I got the following error:

Only valid with DatetimeIndex or PeriodIndex  

Then I tried:

df['ArrivalDate'].apply(lambda(x):x[:-2]) 

I got the following error:

'Timestamp' object has no attribute '__getitem__'  

Any suggestions?

Edit: I sort of figured it out.

df.index = df['ArrivalDate'] 

Then, I can resample another column using the index.

But I'd still like a method for reconfiguring the entire column. Any ideas?

like image 661
monkeybiz7 Avatar asked Aug 05 '14 18:08

monkeybiz7


People also ask

How do you extract month and year from a date field?

Except the above formula, you can also apply this formula: =TEXT(A2, "mmm") & "-" & TEXT(A2, "yyyy"). 2. In above formulas, A2 indicates the date cell that you want to use, and the separator “-” is used to separate the month and year, you can change it to any other delimiters you need.


2 Answers

If you want new columns showing year and month separately you can do this:

df['year'] = pd.DatetimeIndex(df['ArrivalDate']).year df['month'] = pd.DatetimeIndex(df['ArrivalDate']).month 

or...

df['year'] = df['ArrivalDate'].dt.year df['month'] = df['ArrivalDate'].dt.month 

Then you can combine them or work with them just as they are.

like image 95
KieranPC Avatar answered Sep 30 '22 23:09

KieranPC


The df['date_column'] has to be in date time format.

df['month_year'] = df['date_column'].dt.to_period('M') 

You could also use D for Day, 2M for 2 Months etc. for different sampling intervals, and in case one has time series data with time stamp, we can go for granular sampling intervals such as 45Min for 45 min, 15Min for 15 min sampling etc.

like image 37
kabrapankaj32 Avatar answered Sep 30 '22 23:09

kabrapankaj32