Plotting a cumulative graph of python datetimes

Tags:

Say I have a list of datetimes, and we know each datetime to be the recorded time of an event happening.

Is it possible in matplotlib to graph the frequency of this event occurring over time, showing this data in a cumulative graph (so that each point is greater or equal to all of the points that went before it), without preprocessing this list? (e.g. passing datetime objects directly to some wonderful matplotlib function)

Or do I need to turn this list of datetimes into a list of dictionary items, such as:

{"year": 1998, "month": 12, "date": 15, "events": 92}

and then generate a graph from this list?

551

asked Jun 13 '10 22:06

ventolin

2 Answers

This should work for you:

counts = arange(0, len(list_of_dates))
plot(list_of_dates, counts)

You can of course give any of the usual options to the plot call to make the graph look the way you want it. (I'll point out that matplotlib is very adept at handling dates and times.)

Another option would be the hist function - it has an option 'cumulative=True' that might be useful. You can create a cumulative histogram showing the number of events that have occurred as of any given date something like this:

from pyplot import hist
from matplotlib.dates import date2num
hist(date2num(list_of_dates), cumulative=True)

But this produces a bar chart, which might not be quite what you're looking for, and in any case making the date labels on the horizontal axis display properly will probably require some fudging.

EDIT: I'm getting the sense that what you really want is one point (or bar) per date, with the corresponding y-value being the number of events that have occurred up to (and including?) that date. In that case, I'd suggest doing something like this:

grouped_dates = [[d, len(list(g))] for d,g in itertools.groupby(list_of_dates, lambda k: k.date())]
dates, counts = grouped_dates.transpose()
counts = counts.cumsum()
step(dates, counts)

The groupby function from the itertools module will produce the kind of data you're looking for: only a single instance of each date, accompanied by a list (an iterator, actually) of all the datetime objects that have that date. As suggested by Jouni in the comments, the step function will give a graph that steps up at each day on which events occurred, so I'd suggest using that in place of plot.

(Hat tip to EOL for reminding me about cumsum)

If you want to have one point for every day, regardless of whether any events occurred on that day or not, you'll need to alter the above code a bit:

from matplotlib.dates import drange, num2date
date_dict = dict((d, len(list(g))) for d,g in itertools.groupby(list_of_dates, lambda k: k.date()))
dates = num2date(drange(min(list_of_dates).date(), max(list_of_dates).date() + timedelta(1), timedelta(1)))
counts = asarray([date_dict.get(d.date(), 0) for d in dates]).cumsum()
step(dates, counts)

I don't think it'll really make a difference for the plot produced by the step function though.

175

answered Oct 07 '22 19:10

David Z

So, you start with a list of dates that you want to histogram:

from datetime import  datetime
list_of_datetime_datetime_objects = [datetime(2010, 6, 14), datetime(1974, 2, 8), datetime(1974, 2, 8)]

Matplotlib allows you to convert a datetime.datetime object into a simple number, as David mentioned:

from matplotlib.dates import date2num, num2date
num_dates = [date2num(d) for d in list_of_datetime_datetime_objects]

You can then calculate the histogram of your data (look at NumPy histogram docs for more options (number of bins, etc.)):

import numpy
histo = numpy.histogram(num_dates)

Since you want the cumulative histogram, you add individual counts together:

cumulative_histo_counts = histo[0].cumsum()

The histogram plot will need the bin size:

from matplotlib import pyplot

You can then plot the cumulative histogram:

bin_size = histo[1][1]-histo[1][0]
pyplot.bar(histo[1][:-1], cumulative_histo_counts, width=bin_size)

Alternatively, you might want a curve instead of an histogram:

# pyplot.plot(histo[1][1:], cumulative_histo_counts)

If you want dates on the x axis instead of numbers, you can convert the numbers back to dates and ask matplotlib to use date strings as ticks, instead of numbers:

from matplotlib import ticker

# The format for the x axis is set to the chosen string, as defined from a numerical date:
pyplot.gca().xaxis.set_major_formatter(ticker.FuncFormatter(lambda numdate, _: num2date(numdate).strftime('%Y-%d-%m')))
# The formatting proper is done:
pyplot.gcf().autofmt_xdate()
# To show the result:
pyplot.show()  # or draw(), if you don't want to block

Here, gca() and gcf() return the current axis and figure, respectively.

Of course, you can adapt the way you display dates, in the call to strftime() above.

To go beyond your question, I would like to mention that Matplotlib's gallery is a very good source of information: you can generally quickly find what you need by just finding images that look like what you're trying to do, and looking at their source code.

Example of accumulative curve with datetime labels

answered Oct 07 '22 18:10

Eric O Lebigot

Related questions
                            
                                Most pythonic and/or performant way to assign a single value to a slice?
                            
                                Python ThreadPoolExecutor - is the callback guaranteed to run in the same thread as submitted func?
                            
                                How to upload small files to Amazon S3 efficiently in Python
                            
                                Will a UNICODE string just containing ASCII characters always be equal to the ASCII string?
                            
                                Problems obtaining most informative features with scikit learn?
                            
                                How to restrict Django Rest Framework browsable API interface to admin users
                            
                                How to print each line of a script as it is run only for the top-level script being run?
                            
                                Using Spyder IDE, how do you return from "goto definition"?
                            
                                Apache Spark throws NullPointerException when encountering missing feature
                            
                                Memory-efficient way to generate a large numpy array containing random boolean values
                            
                                pd.rolling_mean becoming deprecated - alternatives for ndarrays
                            
                                Google Cloud Vision - Numbers and Numerals OCR
                            
                                Pandas DataFrame - 'cannot astype a datetimelike from [datetime64[ns]] to [float64]' when using ols/linear regression
                            
                                how to continue for loop after exception?
                            
                                How do you get the filename of a Python wheel when running setup.py?
                            
                                Python "triplet" dictionary?
                            
                                Kerberos authentication with python
                            
                                What's a good library to do computational geometry (like CGAL) in a garbage-collected language?
                            
                                How to exclude DEFAULTs from Python ConfigParser .items()?
                            
                                Best Python module for Berkeley DB? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Plotting a cumulative graph of python datetimes

Tags:

python

datetime

graph

matplotlib

ventolin

People also ask

2 Answers

David Z

Eric O Lebigot

Recent Activity

Donate For Us