Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Make a list of months and years in Pandas

I am trying to make a date_list that is a range of the months in the last year. It should be formatted ['Oct 2014', 'Nov 2014', 'Dec 2014', 'Jan 2015', 'Feb 2015',... 'Sep 2015']

Sometimes it works, and sometimes it cuts off the current month. And on the last day of the month it wasn't working so I made this if statement.

#If it is the last day of the month, go back a year and add a day to start at beginning of the month
#Otherwise, go back a year, go to the end of the month, and add a day to start at the beginning of the month

if datetime.datetime.now().day == calendar.monthrange(date.year, date.month)[1]:
    range_min = range_max - pd.tseries.offsets.DateOffset(years=1)+ pd.tseries.offsets.DateOffset(days=1, normalize = True)
else:
    range_min = range_max - pd.tseries.offsets.DateOffset(years=1)+ pd.tseries.offsets.MonthEnd(1) + pd.tseries.offsets.DateOffset(days=1, normalize = True)

Then I take a slice of my data, which right now goes from 2014-10-01 to 2015-09-02 as expected. Then I make the date list! This is the critical bit that I am struggling with.

# take slice with required of data
df = df[(df['recvd_dttm'] >= range_min) & 
               (df['recvd_dttm'] <= range_max)]

#Make a date list in order to get the months to plot later on           
date_list = pd.DataFrame(index=pd.date_range(start = range_min, end = datetime.datetime.now(), freq='M'))
date_list = date_list.index.to_series().apply(lambda x: datetime.datetime.strftime(x, '%b %Y')).tolist()            

It may be a problem with my range, which is this:

range_min
Out[5]: Timestamp('2014-10-01 00:00:00')

range_max
Out[6]: datetime.datetime(2015, 9, 2, 10, 53, 24, 66000)

but the first date_list command outputs this:

pd.DataFrame(index=pd.date_range(start = range_min, end = datetime.datetime.now(), freq='M'))
Out[8]: 
Empty DataFrame
Columns: []
Index: [2014-10-31 00:00:00, 2014-11-30 00:00:00, 2014-12-31 00:00:00, 2015-01-31 00:00:00, 2015-02-28 00:00:00, 2015-03-31 00:00:00, 2015-04-30 00:00:00, 2015-05-31 00:00:00, 2015-06-30 00:00:00, 2015-07-31 00:00:00, 2015-08-31 00:00:00]

Which doesn't include 2015-09 , so my date_list cuts off September. Not sure why this is happening. Also if anyone knows of an easier way to create this list, feel free to include that!

Thanks!

like image 938
jenryb Avatar asked Oct 14 '25 16:10

jenryb


2 Answers

You can make a list of months in the past year without using pandas.

date_list=[datetime.date.today()- dateutil.relativedelta.relativedelta(months = x) for x in range(11,-1,-1)]
month_list=[datetime.date.strftime(x,'%b %Y') for x in date_list]

If needed you can convert the above list to pandas dateframe with index as months.

month_list=pd.DataFrame(index=month_list)
like image 175
kanatti Avatar answered Oct 17 '25 04:10

kanatti


I think you're problem is the freq='M' is rolling today back to 08/31. You can look into pd.tseries.offsets.Month and look at using the rollforward method. I've done that in the past.

Another solution is to just build a full list of days, and force that into monthly periods. This is a bit hackish, but should push you in the right direction:

pd.date_range(start = range_min, end = datetime.datetime.now()).to_period('M').unique()

Then just take the last 12 from that list (because it returns 13) I don't think the unique is necessary at the end.

like image 45
Brian Pendleton Avatar answered Oct 17 '25 06:10

Brian Pendleton



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!