Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plot pandas DataFrame against month

I need to create a bar plot of the frequency of rows, grouped by month.

The problem is that the horizontal axis is not a correct time axis: it misses the months in which there are no data so it is not a continuous time axis.

Example code:

%matplotlib inline
import pandas as pd

d = {'model': 'ep', 
     'date': ('2017-02-02', '2017-02-04', '2017-03-01')}
df1 = pd.DataFrame(d)

d = {'model': 'rs',
     'date': ('2017-01-12', '2017-01-04', '2017-05-01')}
df2 = pd.DataFrame(d)

df = pd.concat([df1, df2])

# Create a column containing the month
df['month'] = pd.to_datetime(df['date']).dt.to_period('M')

# Group by the month and plot
df.groupby('month')['model'].count().plot.bar();

The resulting bar chart is missing the month 2017-04.

enter image description here

How can pandas be made to plot all months, even those with no data?

like image 571
blokeley Avatar asked Apr 28 '17 14:04

blokeley


Video Answer


2 Answers

For the record, I used this code:

%matplotlib inline
import pandas as pd

d = {'model': 'ep', 
     'date': ('2017-02-02', '2017-02-04', '2017-03-01')}
df1 = pd.DataFrame(d)

d = {'model': 'rs',
     'date': ('2017-01-12', '2017-01-04', '2017-05-01')}
df2 = pd.DataFrame(d)

df = pd.concat([df1, df2])

# Create a column containing the month
df['month'] = pd.to_datetime(df['date']).dt.to_period('M')

# Get the start and end months
months = df['month'].sort_values()
start_month = months.iloc[0]
end_month = months.iloc[-1]

index = pd.PeriodIndex(start=start_month, end=end_month)

df.groupby('month')['model'].count().reindex(index).plot.bar();

Which gives this plot:

enter image description here

Thanks to EdChum

like image 99
blokeley Avatar answered Nov 03 '22 01:11

blokeley


You can reindex and pass a constructed PeriodIndex to achieve this:

df.groupby('month')['model'].count().reindex(pd.PeriodIndex(start=df['month'].sort_values().iloc[0], periods=5)).plot.bar()

enter image description here

For some reason reindex loses the index name, you can restore this:

gp = df.groupby('month')['model'].count()
gp = gp.reindex(pd.PeriodIndex(start=df['month'].sort_values().iloc[0], periods=5))
gp.index.name = 'month'
gp.plot.bar()

to get the plot:

enter image description here

like image 32
EdChum Avatar answered Nov 03 '22 00:11

EdChum