I have a pandas dataframe of events, with the timestamp as the index and some scalar value (its meaning is not so important here) in the column. I would like to plot a timeseries how many events happened during any hour.
The original data (much more than displayed here) looks like this:
size
timestamp
2015-08-17 15:07:05.628000 50877
2015-08-17 15:07:05.701000 62989
2015-08-17 15:07:05.752000 33790
2015-08-17 15:07:05.802000 100314
2015-08-17 15:07:05.862000 10372
....
I subsequently grouped these events by hour in the following manner:
counts = df.groupby( [df.index.year, df.index.month, df.index.day, df.index.hour] ).count()
i.e. ending up with a multi-level index, with 4 levels.
But now I am struggling to create a nice graph of it. Admittedly, my pandas visualisation skills are very dodgy. I haven't gotten much further than:
counts.plot()
But this makes the x-axis completely unreadable (a sequence of tuples). I'd like the x-axis to be a proper time series that scales nicely with the resolution of the plot etc. I am doing this in IPython, in case it matters. (I guess this question may come down to how to collapse the 4 index levels into one timestamp again).
I'd happily go through some kind of reference, so feel free to point me to any useful links to read up. I looked around, but couldn't immediately find anything on the particular topic.
(Also, feel free to suggest any alternative ways to achieve what I want to do - not sure the multi-level index is the most appropriate).
Thanks!
I think what you are looking for is resample
. It is designed to handle regrouping by time frames. Try:
df.resample('1H').count().plot()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With