Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: convert datetime to end-of-month

Tags:

python

pandas

I have written a function to convert pandas datetime dates to month-end:

import pandas
import numpy
import datetime
from pandas.tseries.offsets import Day, MonthEnd

def get_month_end(d):
    month_end = d - Day() + MonthEnd() 
    if month_end.month == d.month:
        return month_end # 31/March + MonthEnd() returns 30/April
    else:
        print "Something went wrong while converting dates to EOM: " + d + " was converted to " + month_end
        raise

This function seems to be quite slow, and I was wondering if there is any faster alternative? The reason I noticed it's slow is that I am running this on a dataframe column with 50'000 dates, and I can see that the code is much slower since introducing that function (before I was converting dates to end-of-month).

df = pandas.read_csv(inpath, na_values = nas, converters = {open_date: read_as_date})
df[open_date] = df[open_date].apply(get_month_end)

I am not sure if that's relevant, but I am reading the dates in as follows:

def read_as_date(x):
    return datetime.datetime.strptime(x, fmt)
like image 861
Anne Avatar asked Aug 14 '13 13:08

Anne


People also ask

How do I get start and end date month data from a specific date in python?

There are a few ways to do this, but I've gone with the following: last_date = datetime(year, month + 1, 1) + timedelta(days=-1) . This will calculate the first date of the following month, then subtract 1 day from it to get the last date of the current month.

How do I get the last day in Python?

To get the last day of the month using Python, the easiest way is with the timerange() function from the calendar module to get the number of days in the month, and then create a new date.


3 Answers

Revised, converting to period and then back to timestamp does the trick

In [104]: df = DataFrame(dict(date = [Timestamp('20130101'),Timestamp('20130131'),Timestamp('20130331'),Timestamp('20130330')],value=randn(4))).set_index('date')

In [105]: df
Out[105]: 
               value
date                
2013-01-01 -0.346980
2013-01-31  1.954909
2013-03-31 -0.505037
2013-03-30  2.545073

In [106]: df.index = df.index.to_period('M').to_timestamp('M')

In [107]: df
Out[107]: 
               value
2013-01-31 -0.346980
2013-01-31  1.954909
2013-03-31 -0.505037
2013-03-31  2.545073

Note that this type of conversion can also be done like this, the above would be slightly faster, though.

In [85]: df.index + pd.offsets.MonthEnd(0) 
Out[85]: DatetimeIndex(['2013-01-31', '2013-01-31', '2013-03-31', '2013-03-31'], dtype='datetime64[ns]', name=u'date', freq=None, tz=None)
like image 118
Jeff Avatar answered Oct 08 '22 11:10

Jeff


import pandas as pd
import numpy as np
import datetime as dt    

df0['Calendar day'] = pd.to_datetime(df0['Calendar day'], format='%m/%d/%Y')
df0['Calendar day'] = df0['Calendar day'].apply(pd.datetools.normalize_date)    
df0['Month Start Date'] = df0['Calendar day'].dt.to_period('M').apply(lambda r: r.start_time)

This code should work. Calendar Day is a column in which date is given in the format %m/%d/%Y. For example: 12/28/2014 is 28 December, 2014. The output comes out to be 2014-12-01 in class 'pandas.tslib.Timestamp' type.

like image 27
Piyush Jena Avatar answered Oct 08 '22 10:10

Piyush Jena


If the date column is in datetime format and is set to starting day of the month, this will add one month of time to it:

df['date1']=df['date'] + pd.offsets.MonthEnd(0) 
like image 38
Dimanjan Avatar answered Oct 08 '22 10:10

Dimanjan