I am making a series of bar plots of data with two categorical variables and one numeric. What i have is the below, but what I would love to do is to facet by one of the categorical variables as with facet_wrap
in ggplot
. I have a somewhat working example, but I get the wrong plot type (lines and not bars) and I do subsetting of the data in a loop--that can't be the best way.
## first try--plain vanilla
import pandas as pd
import numpy as np
N = 100
## generate toy data
ind = np.random.choice(['a','b','c'], N)
cty = np.random.choice(['x','y','z'], N)
jobs = np.random.randint(low=1,high=250,size=N)
## prep data frame
df_city = pd.DataFrame({'industry':ind,'city':cty,'jobs':jobs})
df_city_grouped = df_city.groupby(['city','industry']).jobs.sum().unstack()
df_city_grouped.plot(kind='bar',stacked=True,figsize=(9, 6))
This gives something like this:
city industry jobs
0 z b 180
1 z c 121
2 x a 33
3 z a 121
4 z c 236
However, what i would like to see is something like this:
## R code
library(plyr)
df_city<-read.csv('/home/aksel/Downloads/mockcity.csv',sep='\t')
## summarize
df_city_grouped <- ddply(df_city, .(city,industry), summarise, jobstot = sum(jobs))
## plot
ggplot(df_city_grouped, aes(x=industry, y=jobstot)) +
geom_bar(stat='identity') +
facet_wrap(~city)
The closest I get with matplotlib is something like this:
cols =df_city.city.value_counts().shape[0]
fig, axes = plt.subplots(1, cols, figsize=(8, 8))
for x, city in enumerate(df_city.city.value_counts().index.values):
data = df_city[(df_city['city'] == city)]
data = data.groupby(['industry']).jobs.sum()
axes[x].plot(data)
So two questions:
ggplot
example; Second example here: http://pandas-docs.github.io/pandas-docs-travis/visualization.html#bar-plots
Anyway, you can always do that by hand, as you did yourself.
EDIT: BTW, you can always use rpy2 in python, so you can do all the same things as in R.
Also, have a look at this: https://pandas.pydata.org/pandas-docs/version/0.14.1/rplot.html I am not sure, but it should be helpful for creating plots over many panels, though might require further reading.
@tcasell suggested the bar
call in the loop. Here is a working, if not elegant, example.
## second try--facet by county
N = 100
industry = ['a','b','c']
city = ['x','y','z']
ind = np.random.choice(industry, N)
cty = np.random.choice(city, N)
jobs = np.random.randint(low=1,high=250,size=N)
df_city =pd.DataFrame({'industry':ind,'city':cty,'jobs':jobs})
## how many panels do we need?
cols =df_city.city.value_counts().shape[0]
fig, axes = plt.subplots(1, cols, figsize=(8, 8))
for x, city in enumerate(df_city.city.value_counts().index.values):
data = df_city[(df_city['city'] == city)]
data = data.groupby(['industry']).jobs.sum()
print (data)
print type(data.index)
left= [k[0] for k in enumerate(data)]
right= [k[1] for k in enumerate(data)]
axes[x].bar(left,right,label="%s" % (city))
axes[x].set_xticks(left, minor=False)
axes[x].set_xticklabels(data.index.values)
axes[x].legend(loc='best')
axes[x].grid(True)
fig.suptitle('Employment By Industry By City', fontsize=20)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With