Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: bar plot with multiIndex dataframe

I have a pandas DataFrame with a TIMESTAMP column (not the index), and the timestamp format is as follows:

2015-03-31 22:56:45.510

I also have columns called CLASS and AXLES. I would like to compute the count of records for each month separately for each unique value of AXLES (AXLES can take an integer value between 3-12).

I came up with a combination of resample and groupby:

resamp = dfWIM.set_index('TIMESTAMP').groupby('AXLES').resample('M', how='count').CLASS

This seems to give me a multiIndex dataframe object, as shown below.

In [72]: resamp

Out [72]:

AXLES  TIMESTAMP 
3      2014-07-31      5517
       2014-08-31     31553
       2014-09-30     42816
       2014-10-31     49308
       2014-11-30     44168
       2014-12-31     45518
       2015-01-31     54782
       2015-02-28     52166
       2015-03-31     47929
4      2014-07-31      3147
       2014-08-31     24810
       2014-09-30     39075
       2014-10-31     46857
       2014-11-30     42651
       2014-12-31     48282
       2015-01-31     42708
       2015-02-28     43904
       2015-03-31     50033

From here, how can I access different components of this multiIndex object to create a bar plot for the following conditions?

  • show data when AXLES = 3
  • show x ticks in the Month - Year format (no days, hours, minutes etc.)

Thanks!

EDIT: Following code gives me the plot, but I could not change the xtick formatting to MM-YY.

resamp[3].plot(kind='bar')

enter image description here

EDIT 2 below is a code snippet that generates a small sample of the data similar to what I have:

dftest = {'TIMESTAMP':['2014-08-31','2014-09-30','2014-10-31'], 'AXLES':[3, 3, 3], 'CLASS':[5,6,7]}
dfTest = pd.DataFrame(dftest)
dfTest.TIMESTAMP = pd.to_datetime(pd.Series(dfTest.TIMESTAMP))
resamp = dfTest.set_index('TIMESTAMP').groupby('AXLES').resample('M', how='count').CLASS
resamp[3].plot(kind='bar')

EDIT 3: Here below is the solution:

A.Plot the whole resampled dataframe (based on @Ako 's suggestion):

df = resamp.unstack(0)
df.index = [ts.strftime('%b 20%y') for ts in df.index]
df.plot(kind='bar', rot=0)

enter image description here

B.Plot an individual index from the resampled dataframe (based on @Alexander 's suggestion):

df = resamp[3]
df.index = [ts.strftime('%b 20%y') for ts in df.index]
df.plot(kind='bar', rot=0)

enter image description here

like image 912
marillion Avatar asked Nov 11 '15 00:11

marillion


People also ask

How do I convert MultiIndex to columns?

We can easily convert the multi-level index into the column by the reset_index() method. DataFrame. reset_index() is used to reset the index to default and make the index a column of the dataframe.


2 Answers

You could generate and set the labels explicitly using ax.xaxis.set_major_formatter with a ticker.FixedFormatter. This will allow you to keep your DataFrame's MultiIndex with timestamp values, while displaying the timestamps in the desired %m-%Y format:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.ticker as ticker

dftest = {'TIMESTAMP':['2014-08-31','2014-09-30','2014-10-31'], 'AXLES':[3, 3, 3], 'CLASS':[5,6,7]}
dfTest = pd.DataFrame(dftest)
dfTest.TIMESTAMP = pd.to_datetime(pd.Series(dfTest.TIMESTAMP))
resamp = dfTest.set_index('TIMESTAMP').groupby('AXLES').resample('M', how='count').CLASS

ax = resamp[3].plot(kind='bar')
ticklabels = [timestamp.strftime('%m-%Y') for axle, timestamp in resamp.index]
ax.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: ticklabels[int(x)]))
plt.gcf().autofmt_xdate()

plt.show()

yields enter image description here

like image 51
unutbu Avatar answered Oct 29 '22 04:10

unutbu


The following should work, but it is difficult to test without some data.

Start by resetting your index to get access to the TIMESTAMP column. Then use strftime to format it to your desired text representation (e.g. mm-yy). Finally, reset the index back to AXLES and TIMESTAMP.

df = resamp.reset_index()
df['TIMESTAMP'] = [ts.strftime('%m-%y') for ts in df.TIMESTAMP]
df.set_index(['AXLES', 'TIMESTAMP'], inplace=True)
>>> df.xs(3, level=0).plot(kind='bar')

enter image description here

like image 35
Alexander Avatar answered Oct 29 '22 03:10

Alexander