Say I have a list of datetimes, and we know each datetime to be the recorded time of an event happening.
Is it possible in matplotlib to graph the frequency of this event occurring over time, showing this data in a cumulative graph (so that each point is greater or equal to all of the points that went before it), without preprocessing this list? (e.g. passing datetime objects directly to some wonderful matplotlib function)
Or do I need to turn this list of datetimes into a list of dictionary items, such as:
{"year": 1998, "month": 12, "date": 15, "events": 92}
and then generate a graph from this list?
Practical Data Science using Python To get a reverse-order cumulative histogram in Matplotlib, we can use cumulative = -1 in the hist() method. Set the figure size and adjust the padding between and around the subplots. Make a list of data points. Plot a histogram with data and cumulative = -1.
In X-axis we should have a variable of DateTime. In Y-axis we can have the variable which we want to analyze with respect to time. plt. plot() method is used to plot the graph in matplotlib.
This should work for you:
counts = arange(0, len(list_of_dates))
plot(list_of_dates, counts)
You can of course give any of the usual options to the plot
call to make the graph look the way you want it. (I'll point out that matplotlib is very adept at handling dates and times.)
Another option would be the hist function - it has an option 'cumulative=True' that might be useful. You can create a cumulative histogram showing the number of events that have occurred as of any given date something like this:
from pyplot import hist
from matplotlib.dates import date2num
hist(date2num(list_of_dates), cumulative=True)
But this produces a bar chart, which might not be quite what you're looking for, and in any case making the date labels on the horizontal axis display properly will probably require some fudging.
EDIT: I'm getting the sense that what you really want is one point (or bar) per date, with the corresponding y-value being the number of events that have occurred up to (and including?) that date. In that case, I'd suggest doing something like this:
grouped_dates = [[d, len(list(g))] for d,g in itertools.groupby(list_of_dates, lambda k: k.date())]
dates, counts = grouped_dates.transpose()
counts = counts.cumsum()
step(dates, counts)
The groupby
function from the itertools
module will produce the kind of data you're looking for: only a single instance of each date, accompanied by a list (an iterator, actually) of all the datetime
objects that have that date. As suggested by Jouni in the comments, the step
function will give a graph that steps up at each day on which events occurred, so I'd suggest using that in place of plot
.
(Hat tip to EOL for reminding me about cumsum
)
If you want to have one point for every day, regardless of whether any events occurred on that day or not, you'll need to alter the above code a bit:
from matplotlib.dates import drange, num2date
date_dict = dict((d, len(list(g))) for d,g in itertools.groupby(list_of_dates, lambda k: k.date()))
dates = num2date(drange(min(list_of_dates).date(), max(list_of_dates).date() + timedelta(1), timedelta(1)))
counts = asarray([date_dict.get(d.date(), 0) for d in dates]).cumsum()
step(dates, counts)
I don't think it'll really make a difference for the plot produced by the step
function though.
So, you start with a list of dates that you want to histogram:
from datetime import datetime
list_of_datetime_datetime_objects = [datetime(2010, 6, 14), datetime(1974, 2, 8), datetime(1974, 2, 8)]
Matplotlib allows you to convert a datetime.datetime
object into a simple number, as David mentioned:
from matplotlib.dates import date2num, num2date
num_dates = [date2num(d) for d in list_of_datetime_datetime_objects]
You can then calculate the histogram of your data (look at NumPy histogram
docs for more options (number of bins, etc.)):
import numpy
histo = numpy.histogram(num_dates)
Since you want the cumulative histogram, you add individual counts together:
cumulative_histo_counts = histo[0].cumsum()
The histogram plot will need the bin size:
from matplotlib import pyplot
You can then plot the cumulative histogram:
bin_size = histo[1][1]-histo[1][0]
pyplot.bar(histo[1][:-1], cumulative_histo_counts, width=bin_size)
Alternatively, you might want a curve instead of an histogram:
# pyplot.plot(histo[1][1:], cumulative_histo_counts)
If you want dates on the x axis instead of numbers, you can convert the numbers back to dates and ask matplotlib to use date strings as ticks, instead of numbers:
from matplotlib import ticker
# The format for the x axis is set to the chosen string, as defined from a numerical date:
pyplot.gca().xaxis.set_major_formatter(ticker.FuncFormatter(lambda numdate, _: num2date(numdate).strftime('%Y-%d-%m')))
# The formatting proper is done:
pyplot.gcf().autofmt_xdate()
# To show the result:
pyplot.show() # or draw(), if you don't want to block
Here, gca()
and gcf()
return the current axis and figure, respectively.
Of course, you can adapt the way you display dates, in the call to strftime()
above.
To go beyond your question, I would like to mention that Matplotlib's gallery is a very good source of information: you can generally quickly find what you need by just finding images that look like what you're trying to do, and looking at their source code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With