Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python, Seaborn: Plotting frequencies with zero-values

I have a Pandas series with values for which I'd like to plot counts. This creates roughly what I want:

dy = sns.countplot(rated.year, color="#53A2BE")
axes = dy.axes
dy.set(xlabel='Release Year', ylabel = "Count")
dy.spines['top'].set_color('none')
dy.spines['right'].set_color('none')
plt.show()

The problem comes with missing data. There are 31 years with ratings, but over a timespan of 42 years. That means there should be some empty bins, which are not being displayed. Is there a way to configure this in Seaborn/Matplotlib? Should I use another type of graph, or is there another fix for this?

I thought about looking into whether it is possible to configure it as a time series, but I have the same problem with rating scales. So, on a 1-10 scale the count for e.g. 4 might be zero, and therefore '4' is not in the Pandas data series, which means it also does not show up in the graph.

The result I'd like is the full scale on the x-axis, with counts (for steps of one) on the y-axis, and showing zero/empty bins for missing instances of the scale, instead of simply showing the next bin for which data is available.

EDIT:

The data (rated.year) looks something like this:

import pandas as pd

rated = pd.DataFrame(data = [2016, 2004, 2007, 2010, 2015, 2016, 2016, 2015,
                             2011, 2010, 2016, 1975, 2011, 2016, 2015, 2016, 
                             1993, 2011, 2013, 2011], columns = ["year"])

It has more values, but the format is the same. As you can see in..

rated.year.value_counts()

..there are quite a few x values for which count would have to be zero in the graph. Currently plot looks like:

Seaborn Plot

like image 813
vander Avatar asked Jul 27 '17 13:07

vander


2 Answers

I solved the problem by using the solution suggested by @mwaskom in the comments to my question. I.e. to add an 'order' to the countplot with all valid values for year, including those with count equals zero. This is the code that produces the graph:

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

rated = pd.DataFrame(data = [2016, 2004, 2007, 2010, 2015, 2016, 2016, 2015,
                             2011, 2010, 2016, 1975, 2011, 2016, 2015, 2016, 
                             1993, 2011, 2013, 2011], columns = ["year"])

dy = sns.countplot(rated.year, color="#53A2BE", order = list(range(rated.year.min(),rated.year.max()+1)))
axes = dy.axes
dy.set(xlabel='Release Year', ylabel = "Count")
dy.spines['top'].set_color('none')
dy.spines['right'].set_color('none')
plt.show()
like image 198
vander Avatar answered Sep 23 '22 00:09

vander


Consider a seaborn barplot by creating a reindexed series casted to a dataframe:

# REINDEXED DATAFRAME
rated_ser = pd.DataFrame(rated['year'].value_counts().\
                         reindex(range(rated.year.min(),rated.year.max()+1), fill_value=0))\
                         .reset_index()

# SNS BAR PLOT
dy = sns.barplot(x='index', y='year', data=rated_ser, color="#53A2BE")
dy.set_xticklabels(dy.get_xticklabels(), rotation=90)   # ROTATE LABELS, 90 DEG.
axes = dy.axes
dy.set(xlabel='Release Year', ylabel = "Count")

dy.spines['top'].set_color('none')
dy.spines['right'].set_color('none')

Seaborn Plot Output

like image 29
Parfait Avatar answered Sep 24 '22 00:09

Parfait