Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to align bins in a histogram of datetime objects using the hist() function

So I am trying to plot a histogram of some datetime objects, but i can never get the bins to line up with the bars. My code is below:

I start by importing my stuff and declaring the start, end and a one day object:

import datetime
import matplotlib.pyplot as plt

end = datetime.date(2017,5,14) 
start = datetime.date(2017,5,8) 
one_day = datetime.timedelta(days = 1)  

Then I declare an arbitrary list of dates:

date_list = [datetime.date(2017,5,14), datetime.date(2017,5,14), 
datetime.date(2017,5,14), datetime.date(2017,5,9), datetime.date(2017,5,13), 
datetime.date(2017,5,12), datetime.date(2017,5,11), 
datetime.date(2017,5,11), datetime.date(2017,5,9)]

Then I go through the range of days between my start and end (in this case a week), adding each date to a list:

week = [] 
for i in range((end-start).days+1):  
    week.append(start + (i)*one_day)

The reason for this is that some of the days in the week don't come up in the date_list (I know I could just skip this for this case, but I want it to be expandable to other date_lists).

Then I plot with hist():

plt.hist(date_list, bins = len(week)+1)
plt.show()

I've tried all manner of combinations of bin formats with various +1's and ranges and aligns but the date never sit in a consistent place in the bar.

like image 934
emily Avatar asked May 13 '17 14:05

emily


People also ask

Which is the default value of bins in the hist () function?

The default value of the number of bins to be created in a histogram is 10.

What does bins mean in PLT hist?

It is a type of bar graph. To construct a histogram, the first step is to “bin” the range of values — that is, divide the entire range of values into a series of intervals — and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable.


1 Answers

If simply setting the number of bins, the bins will be equally distributed between the first and last value (of the data). Since the data is e.g. missing the start value, the bins will be smaller than a complete day.

To overcome this, one need to explicitely set the desired bins to the bins argument of hist. Unfortunately, one cannot directly use the list of datetimes here, so the datetimes need to be converted to numbers first. This can be done using matplotlib's matplotlib.dates.date2num method.

The complete example:

import datetime
import matplotlib.pyplot as plt
from matplotlib.dates import date2num

end = datetime.date(2017,5,14) 
start = datetime.date(2017,5,8) 
one_day = datetime.timedelta(days = 1)  

date_list = [datetime.date(2017,5,14), datetime.date(2017,5,14), 
datetime.date(2017,5,14), datetime.date(2017,5,9), datetime.date(2017,5,13), 
datetime.date(2017,5,12), datetime.date(2017,5,11), 
datetime.date(2017,5,11), datetime.date(2017,5,9)]

week = [] 
for i in range((end-start).days+1):  
    week.append(start + (i)*one_day)

numweek = date2num(week)

plt.hist(date_list, bins = numweek, ec="k")
plt.gcf().autofmt_xdate()
plt.show()

enter image description here

Note that the datetime.date(2017,5,14) is part of the bin between the (2017,5,13) and (2017,5,14), so you might want to set the enddate to datetime.date(2017,5,15).

like image 83
ImportanceOfBeingErnest Avatar answered Sep 24 '22 14:09

ImportanceOfBeingErnest