Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting a cumulative graph of python datetimes

Say I have a list of datetimes, and we know each datetime to be the recorded time of an event happening.

Is it possible in matplotlib to graph the frequency of this event occurring over time, showing this data in a cumulative graph (so that each point is greater or equal to all of the points that went before it), without preprocessing this list? (e.g. passing datetime objects directly to some wonderful matplotlib function)

Or do I need to turn this list of datetimes into a list of dictionary items, such as:

{"year": 1998, "month": 12, "date": 15, "events": 92}

and then generate a graph from this list?

like image 551
ventolin Avatar asked Jun 13 '10 22:06

ventolin


People also ask

How do you make a cumulative histogram in Python?

Practical Data Science using Python To get a reverse-order cumulative histogram in Matplotlib, we can use cumulative = -1 in the hist() method. Set the figure size and adjust the padding between and around the subplots. Make a list of data points. Plot a histogram with data and cumulative = -1.

How do I plot time series data in Matplotlib?

In X-axis we should have a variable of DateTime. In Y-axis we can have the variable which we want to analyze with respect to time. plt. plot() method is used to plot the graph in matplotlib.


2 Answers

This should work for you:

counts = arange(0, len(list_of_dates))
plot(list_of_dates, counts)

You can of course give any of the usual options to the plot call to make the graph look the way you want it. (I'll point out that matplotlib is very adept at handling dates and times.)

Another option would be the hist function - it has an option 'cumulative=True' that might be useful. You can create a cumulative histogram showing the number of events that have occurred as of any given date something like this:

from pyplot import hist
from matplotlib.dates import date2num
hist(date2num(list_of_dates), cumulative=True)

But this produces a bar chart, which might not be quite what you're looking for, and in any case making the date labels on the horizontal axis display properly will probably require some fudging.

EDIT: I'm getting the sense that what you really want is one point (or bar) per date, with the corresponding y-value being the number of events that have occurred up to (and including?) that date. In that case, I'd suggest doing something like this:

grouped_dates = [[d, len(list(g))] for d,g in itertools.groupby(list_of_dates, lambda k: k.date())]
dates, counts = grouped_dates.transpose()
counts = counts.cumsum()
step(dates, counts)

The groupby function from the itertools module will produce the kind of data you're looking for: only a single instance of each date, accompanied by a list (an iterator, actually) of all the datetime objects that have that date. As suggested by Jouni in the comments, the step function will give a graph that steps up at each day on which events occurred, so I'd suggest using that in place of plot.

(Hat tip to EOL for reminding me about cumsum)

If you want to have one point for every day, regardless of whether any events occurred on that day or not, you'll need to alter the above code a bit:

from matplotlib.dates import drange, num2date
date_dict = dict((d, len(list(g))) for d,g in itertools.groupby(list_of_dates, lambda k: k.date()))
dates = num2date(drange(min(list_of_dates).date(), max(list_of_dates).date() + timedelta(1), timedelta(1)))
counts = asarray([date_dict.get(d.date(), 0) for d in dates]).cumsum()
step(dates, counts)

I don't think it'll really make a difference for the plot produced by the step function though.

like image 175
David Z Avatar answered Oct 07 '22 19:10

David Z


So, you start with a list of dates that you want to histogram:

from datetime import  datetime
list_of_datetime_datetime_objects = [datetime(2010, 6, 14), datetime(1974, 2, 8), datetime(1974, 2, 8)]

Matplotlib allows you to convert a datetime.datetime object into a simple number, as David mentioned:

from matplotlib.dates import date2num, num2date
num_dates = [date2num(d) for d in list_of_datetime_datetime_objects]

You can then calculate the histogram of your data (look at NumPy histogram docs for more options (number of bins, etc.)):

import numpy
histo = numpy.histogram(num_dates)

Since you want the cumulative histogram, you add individual counts together:

cumulative_histo_counts = histo[0].cumsum()

The histogram plot will need the bin size:

from matplotlib import pyplot

You can then plot the cumulative histogram:

bin_size = histo[1][1]-histo[1][0]
pyplot.bar(histo[1][:-1], cumulative_histo_counts, width=bin_size)

Alternatively, you might want a curve instead of an histogram:

# pyplot.plot(histo[1][1:], cumulative_histo_counts)

If you want dates on the x axis instead of numbers, you can convert the numbers back to dates and ask matplotlib to use date strings as ticks, instead of numbers:

from matplotlib import ticker

# The format for the x axis is set to the chosen string, as defined from a numerical date:
pyplot.gca().xaxis.set_major_formatter(ticker.FuncFormatter(lambda numdate, _: num2date(numdate).strftime('%Y-%d-%m')))
# The formatting proper is done:
pyplot.gcf().autofmt_xdate()
# To show the result:
pyplot.show()  # or draw(), if you don't want to block

Here, gca() and gcf() return the current axis and figure, respectively.

Of course, you can adapt the way you display dates, in the call to strftime() above.

To go beyond your question, I would like to mention that Matplotlib's gallery is a very good source of information: you can generally quickly find what you need by just finding images that look like what you're trying to do, and looking at their source code.

Example of accumulative curve with datetime labels

like image 36
Eric O Lebigot Avatar answered Oct 07 '22 18:10

Eric O Lebigot