I've got a dataframe with timeseries data of crime with a facet on offence (which looks like the format below). What I'd like to perform a groupby plot on the dataframe so that it's possible to explore trends in crime over time.
Offence Rolling year total number of offences Month
0 Criminal damage and arson 1001 2003-03-31
1 Drug offences 66 2003-03-31
2 All other theft offences 617 2003-03-31
3 Bicycle theft 92 2003-03-31
4 Domestic burglary 282 2003-03-31
I've got some code which does the job, but it's a bit clumsy and it loses the time series formatting that Pandas delivers on a single plot. (I've included an image to illustrate). Can anyone suggest an idiom for such plots that I can use?
I would turn to Seaborn but I can't work out how to format the xlabel as timeseries.
subs = []
for idx, (i, g) in enumerate(df.groupby("Offence")):
subs.append({"data": g.set_index("Month").resample("QS-APR", how="sum").ix["2010":],
"title":i})
ax = plt.figure(figsize=(25,15))
for i,g in enumerate(subs):
plt.subplot(5, 5, i)
plt.plot(g['data'])
plt.title(g['title'])
plt.xlabel("Time")
plt.ylabel("No. of crimes")
plt.tight_layout()
You can manually create the subplots with matplotlib, and then plot the dataframes on a specific subplot using the ax keyword. For example for 4 subplots (2x2): import matplotlib. pyplot as plt fig, axes = plt.
groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.
This is a reproducible example of 6 scatterplots in Pandas, obtained from pd.groupby()
for 6 consecutive years. On x axis -- there is oil price (brent) for the year, on y -- the value for sp500 for the same year.
import matplotlib.pyplot as plt
import pandas as pd
import Quandl as ql
%matplotlib inline
brent = ql.get('FRED/DCOILBRENTEU')
sp500 = ql.get('YAHOO/INDEX_GSPC')
values = pd.DataFrame({'brent':brent.VALUE, 'sp500':sp500.Close}).dropna()["2009":"2015"]
fig, axes = plt.subplots(2,3, figsize=(15,5))
for (year, group), ax in zip(values.groupby(values.index.year), axes.flatten()):
group.plot(x='brent', y='sp500', kind='scatter', ax=ax, title=year)
This produces the below plot:
(Just in case, from these plots you may infer there was a strong correlation between oil and sp500 in 2010 but not in other years).
You may change kind
in group.plot()
so that it suits your specific kind or data. My anticipation, pandas will preserve the date formatting for x-axis if you have it in your data.
Altair can work great in such cases.
import matplotlib.pyplot as plt
import pandas as pd
import quandl as ql
df = ql.get(["NSE/OIL.1", "WIKI/AAPL.1"], start_date="2013-1-1")
df.columns = ['OIL', 'AAPL']
df['year'] = df.index.year
from altair import *
Chart(df).mark_point(size=1).encode(x='AAPL',y='OIL').configure_cell(width=200, height=150)
Chart(df).mark_point(size=1).encode(x='AAPL',y='OIL', column='year').configure_cell(width=140, height=70).configure_facet_cell(strokeWidth=0)
Chart(df).mark_point(size=1).encode(x='AAPL',y='OIL', color='year:N').configure_cell(width=140, height=70)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With