Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plot binned counts of a timeseries with pandas

Tags:

python

pandas

I have a pandas dataframe of events, with the timestamp as the index and some scalar value (its meaning is not so important here) in the column. I would like to plot a timeseries how many events happened during any hour.

The original data (much more than displayed here) looks like this:

    size
timestamp       
2015-08-17 15:07:05.628000  50877
2015-08-17 15:07:05.701000  62989
2015-08-17 15:07:05.752000  33790
2015-08-17 15:07:05.802000  100314
2015-08-17 15:07:05.862000  10372

....

I subsequently grouped these events by hour in the following manner:

counts = df.groupby( [df.index.year, df.index.month, df.index.day, df.index.hour] ).count()

i.e. ending up with a multi-level index, with 4 levels.

But now I am struggling to create a nice graph of it. Admittedly, my pandas visualisation skills are very dodgy. I haven't gotten much further than:

counts.plot()

But this makes the x-axis completely unreadable (a sequence of tuples). I'd like the x-axis to be a proper time series that scales nicely with the resolution of the plot etc. I am doing this in IPython, in case it matters. (I guess this question may come down to how to collapse the 4 index levels into one timestamp again).

I'd happily go through some kind of reference, so feel free to point me to any useful links to read up. I looked around, but couldn't immediately find anything on the particular topic.

(Also, feel free to suggest any alternative ways to achieve what I want to do - not sure the multi-level index is the most appropriate).

Thanks!

like image 580
Joris Peeters Avatar asked Nov 09 '22 06:11

Joris Peeters


1 Answers

I think what you are looking for is resample. It is designed to handle regrouping by time frames. Try:

df.resample('1H').count().plot()
like image 110
James Avatar answered Nov 24 '22 03:11

James