Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the end of the month of a Pandas DataFrame Series

I have a series within a DataFrame that I read in initially as an object, and then need to convert it to a date in the form of yyyy-mm-dd where dd is the end of the month.

As an example, I have DataFrame df with a column Date as an object:

...      Date    ... ...     200104   ... ...     200508   ... 

What I want when this is all said and done is a date object:

...      Date    ... ...  2001-04-30  ... ...  2005-08-31  ... 

such that df['Date'].item() returns

datetime.date(2001, 04, 30) 

I've used the following code to get almost there, but all my dates are at the beginning of the month, not the end. Please advise.

df['Date'] = pd.to_datetime(df['Date'], format="%Y%m").dt.date 

Note: I've already imported Pandas ad pd, and datetime as dt

like image 626
Lisle Avatar asked May 20 '16 19:05

Lisle


People also ask

How do you get the end of the month in pandas?

If you wanted the last day of the next month, you'd use MonthEnd(2) , etc. This should work for any month, so you don't need to know the number days in the month, or anything like that.

How do you get the month difference in pandas?

Use df. dates1-df. dates2 to find the difference between the two dates and then convert the result in the form of months.

What is Strftime in pandas?

dt. strftime() function is used to convert to Index using specified date_format. The function return an Index of formatted strings specified by date_format, which supports the same string format as the python standard library.


2 Answers

You can use pandas.tseries.offsets.MonthEnd:

from pandas.tseries.offsets import MonthEnd  df['Date'] = pd.to_datetime(df['Date'], format="%Y%m") + MonthEnd(1) 

The 1 in MonthEnd just specifies to move one step forward to the next date that's a month end. (Using 0 or leaving it blank would also work in your case). If you wanted the last day of the next month, you'd use MonthEnd(2), etc. This should work for any month, so you don't need to know the number days in the month, or anything like that. More offset information can be found in the documentation.

Example usage and output:

df = pd.DataFrame({'Date': [200104, 200508, 201002, 201602, 199912, 200611]}) df['EndOfMonth'] = pd.to_datetime(df['Date'], format="%Y%m") + MonthEnd(1)       Date EndOfMonth 0  200104 2001-04-30 1  200508 2005-08-31 2  201002 2010-02-28 3  201602 2016-02-29 4  199912 1999-12-31 5  200611 2006-11-30 
like image 182
root Avatar answered Sep 17 '22 14:09

root


Agreed that root offers is the right method. However, readers who blindly use MonthEnd(1) are in for a surprise if they use the last date of the month as an input:

In [4]: pd.Timestamp('2014-01-01') + MonthEnd(1) Out[4]: Timestamp('2014-01-31 00:00:00')  In [5]: pd.Timestamp('2014-01-31') + MonthEnd(1) Out[5]: Timestamp('2014-02-28 00:00:00') 

Using MonthEnd(0) instead gives this:

In [7]: pd.Timestamp('2014-01-01') + MonthEnd(0) Out[7]: Timestamp('2014-01-31 00:00:00')  In [8]: pd.Timestamp('2014-01-31') + MonthEnd(0) Out[8]: Timestamp('2014-01-31 00:00:00') 

Example to obtain the month end as a string:

from pandas.tseries.offsets import MonthEnd (pd.Timestamp.now() + MonthEnd(0)).strftime('%Y-%m-%dT00:00:00') # '2014-01-31T00:00:00' 
like image 27
Martien Lubberink Avatar answered Sep 20 '22 14:09

Martien Lubberink