Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

weights option for seaborn distplot?

I'd like to have a weights option in seaborn distplot, similar to that in numpy histogram. Without this option, the only alternative would be to apply the weighting to the input array, which could result in an impractical size (and time).

like image 317
nbecker Avatar asked Jul 29 '15 14:07

nbecker


People also ask

What can I use instead of Distplot in Seaborn?

This function has been deprecated and will be removed in seaborn v0. 14.0. It has been replaced by histplot() and displot() , two functions with a modern API and many more capabilities.

What is density in Distplot?

Kernel Density Estimation (KDE) is a way to estimate the probability density function of a continuous random variable. It is used for non-parametric analysis. Setting the hist flag to False in distplot will yield the kernel density estimation plot.

Which command is used to plot density through Seaborn?

Creating a Seaborn Distplot The seaborn. distplot() function is used to plot the distplot. The distplot represents the univariate distribution of data i.e. data distribution of a variable against the density distribution.

How do I increase bin size in Seaborn?

The binwidth parameter enables you to specify the width of the bins. If you use this, it will override the bins parameter. So for example, if you set binwidth = 10 , each histogram bar will be 10 units wide.


2 Answers

You can provide weights by passing them to the underlying matplotlib's histogram function using the hist_kws argument, as:

sns.distplot(..., hist_kws={'weights': your weights array}, ...)

Take note though, that the weights will be passed only to the underlying histogram; neither the kde, nor the fit functions of the distplot will be affected.

like image 146
vlasisva Avatar answered Oct 15 '22 16:10

vlasisva


As @vlasisla already mentioned in their answer, weights should be provided through the keyword argument hist_kws so they would be passed to mathpolotlib's hist function. Though, this will not make any effect unless kde (kernel density estimation) option is disabled at the same time. This code would actually have a desired effect:

sns.distplot(x, hist_kws={'weights': x_weights}, kde=False)

To understand why both weights and kde are not allowed, let's consider the following example, where x_weights is calculated as x_weights = np.ones_like(x) / len(x) so that all bins' heights sum to 1:

# generate 1000 samples from a normal distribution
np.random.seed(8362) 
x = np.random.normal(size=1000)

# calculate weights
x_weights = np.ones_like(x) / len(x)

# figure 1 - use weights
sns.distplot(x, hist_kws={'weights': x_weights}, kde=False)
# figure 2 - default plot with kde
sns.distplot(x)

Figure 1. Using dist with weights and not KDE Figure 2. Using dist with default parameters

In Figure 1 we provided dist function with weights, so in this figure all bins' heights sum to 1. In Figure 2 the default behaviour of dist is enabled, so the area under the KDE function sums to 1 and bins' heights are normalised correspondingly. It can be easily seen now, that plotting KDE when weights are provided indeed would not make much sense.

like image 38
myrs Avatar answered Oct 15 '22 16:10

myrs