Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

`ValueError: operands could not be broadcast together` when attempting to plot a univariate distribution from a DataFrame column using Seaborn

I'm trying to plot the univariate distribution of a column in a Pandas DataFrame. Here's the code:

ad = summary["Acquired Delay"]
sns.distplot(ad)

This throws:

ValueError: operands could not be broadcast together with shapes (9,) (10,) (9,)

I've checked to see if there is anything wrong about this series, passing it as ad.values, but the same error occurs. The problem disappears when I use the .plot method of ad:

ad = summary["Acquired Delay"]
ad.plot.hist()

Successful Plotting with matplotlib

The problem disappears. The plot is less translucent, but reasonably good. Is this a common bug in seaborn? Has this happened because my data contained large number of zeros?

like image 576
Aseem Awad Avatar asked Nov 20 '25 00:11

Aseem Awad


1 Answers

This is happening because the seaborn function distplot includes lines

   if bins is None:
        bins = min(_freedman_diaconis_bins(a), 50)

to set the number of bins when it's not specified, and the _freedman_diaconis_bins function can return a non-integer number if the length of a isn't square and the IQR is 0. And if a is dominated by enough zeros, the IQR will be zero as well, e.g.

>>> sns.distributions.iqr([0]*8 + [1]*2)
0.0

so your intuition that the high number of zeros might be playing a role was right, I think. Anyway, if we get a float number back for the number of bins, that will break np.histogram:

>>> np.histogram([0,0,1], bins=2)
(array([2, 1], dtype=int32), array([ 0. ,  0.5,  1. ]))
>>> np.histogram([0,0,1], bins=2.1)
Traceback (most recent call last):
  File "<ipython-input-4-9aae3e6c77af>", line 1, in <module>
    np.histogram([0,0,1], bins=2.1)
  File "/home/dsm/sys/pys/3.5/lib/python3.5/site-packages/numpy/lib/function_base.py", line 249, in histogram
    n += np.bincount(indices, weights=tmp_w, minlength=bins).astype(ntype)
ValueError: operands could not be broadcast together with shapes (2,) (3,) (2,) 

So I think this is a bug, and you could open a ticket. You can work around it by passing the number of bins directly:

sns.displot(ad, bins=10)

or if you really wanted, you could monkeypatch a fix with something like

sns.distributions._freedman_diaconis_bins_orig =
    sns.distributions._freedman_diaconis_bins
sns.distributions._freedman_diaconis_bins = lambda x:
    np.round(sns.distributions._freedman_diaconis_bins_orig(x)) 
like image 132
DSM Avatar answered Nov 21 '25 12:11

DSM



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!