Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create groupby subplots in Pandas?

I've got a dataframe with timeseries data of crime with a facet on offence (which looks like the format below). What I'd like to perform a groupby plot on the dataframe so that it's possible to explore trends in crime over time.

    Offence                     Rolling year total number of offences       Month
0   Criminal damage and arson   1001                                        2003-03-31
1   Drug offences               66                                         2003-03-31
2   All other theft offences    617                                   2003-03-31
3   Bicycle theft               92                                    2003-03-31
4   Domestic burglary           282                                   2003-03-31

I've got some code which does the job, but it's a bit clumsy and it loses the time series formatting that Pandas delivers on a single plot. (I've included an image to illustrate). Can anyone suggest an idiom for such plots that I can use?

I would turn to Seaborn but I can't work out how to format the xlabel as timeseries.

subs = []
for idx, (i, g) in enumerate(df.groupby("Offence")):
        subs.append({"data": g.set_index("Month").resample("QS-APR", how="sum").ix["2010":],
                     "title":i})

ax = plt.figure(figsize=(25,15))
for i,g in enumerate(subs):
    plt.subplot(5, 5, i)
    plt.plot(g['data'])
    plt.title(g['title'])
    plt.xlabel("Time")
    plt.ylabel("No. of crimes")
    plt.tight_layout()

like image 587
elksie5000 Avatar asked Oct 15 '15 13:10

elksie5000


People also ask

How can I plot separate Pandas Dataframes as subplots?

You can manually create the subplots with matplotlib, and then plot the dataframes on a specific subplot using the ax keyword. For example for 4 subplots (2x2): import matplotlib. pyplot as plt fig, axes = plt.

Can you use Groupby with multiple columns in Pandas?

groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.


2 Answers

This is a reproducible example of 6 scatterplots in Pandas, obtained from pd.groupby() for 6 consecutive years. On x axis -- there is oil price (brent) for the year, on y -- the value for sp500 for the same year.

import matplotlib.pyplot as plt
import pandas as pd
import Quandl as ql
%matplotlib inline

brent = ql.get('FRED/DCOILBRENTEU')
sp500 = ql.get('YAHOO/INDEX_GSPC')
values = pd.DataFrame({'brent':brent.VALUE, 'sp500':sp500.Close}).dropna()["2009":"2015"]

fig, axes = plt.subplots(2,3, figsize=(15,5))
for (year, group), ax in zip(values.groupby(values.index.year), axes.flatten()):
    group.plot(x='brent', y='sp500', kind='scatter', ax=ax, title=year)

This produces the below plot:

enter image description here

(Just in case, from these plots you may infer there was a strong correlation between oil and sp500 in 2010 but not in other years).

You may change kind in group.plot() so that it suits your specific kind or data. My anticipation, pandas will preserve the date formatting for x-axis if you have it in your data.

like image 94
Sergey Bushmanov Avatar answered Nov 01 '22 09:11

Sergey Bushmanov


Altair can work great in such cases.

import matplotlib.pyplot as plt
import pandas as pd
import quandl as ql

df = ql.get(["NSE/OIL.1", "WIKI/AAPL.1"], start_date="2013-1-1")
df.columns = ['OIL', 'AAPL']
df['year'] = df.index.year

from altair import *

Viz #1- No color by year/No columns by year

Chart(df).mark_point(size=1).encode(x='AAPL',y='OIL').configure_cell(width=200, height=150)

enter image description here

Viz #2- No color by year/columns by year

Chart(df).mark_point(size=1).encode(x='AAPL',y='OIL', column='year').configure_cell(width=140, height=70).configure_facet_cell(strokeWidth=0)

enter image description here

Viz #3- Color by year

Chart(df).mark_point(size=1).encode(x='AAPL',y='OIL', color='year:N').configure_cell(width=140, height=70)

enter image description here

like image 33
Nipun Batra Avatar answered Nov 01 '22 11:11

Nipun Batra