Distribution-type graphs (histogram/kde) with weighted data

Tags:

In a nutshell, what is my best option for a distribution-type graphs (histogram or kde) when my data is weighted?

df = pd.DataFrame({ 'x':[1,2,3,4], 'wt':[7,5,3,1] })

df.x.plot(kind='hist',weights=df.wt.values)

That works fine but seaborn won't accept a weights kwarg, i.e.

sns.distplot( df.x, bins=4,              # doesn't work like this
              weights=df.wt.values )     # or with kde=False added

It would also be nice if kde would accept weights but neither pandas nor seaborn seems to allow it.

I realize btw that the data could be expanded to fake the weighting and that's easy here but not of much use with my real data with weights in the hundreds or thousand, so I'm not looking for a workaround like that.

Anyway, that's all. I'm just trying to find out what (if anything) I can do with weighted data besides the basic pandas histogram. I haven't fooled around with bokeh yet, but bokeh suggestions are also welcome.

867

asked Apr 27 '15 02:04

JohnE

2 Answers

You have to understand that seaborn uses the very matplotlib plotting functions that also pandas uses.

As the documentation states, sns.distplot does not accept a weights argument, however it does take a hist_kws argument, which will be sent to the underlying call to plt.hist. Thus, this should do what you want:

sns.distplot(df.x, bins=4, hist_kws={'weights':df.wt.values})

199

answered Sep 30 '22 17:09

hitzg

I solved this problem by resampling the data points based on their weight.

You can do it like this:

from random import random
from bisect import bisect

def weighted_choice(choices):
    values, weights = zip(*choices)
    total = 0
    cum_weights = []
    for w in weights:
        total += w
        cum_weights.append(total)
    x = random() * total
    i = bisect(cum_weights, x)
    return values[i]

samples = [([5, 0.5], 0.1), ([0, 10], 0.3), ([0, -4], 0.3)]
choices = np.array([weighted_choice(samples) for c in range(1000)])
sns.distributions.kdeplot(choices[:, 0], choices[:, 1], shade=True)

answered Sep 30 '22 18:09

Andres Romero

Related questions
                            
                                splitting data into test and train, making a logistic regression model in pandas
                            
                                Proper way to convert bytea from Postgres back to a string in python
                            
                                Payment method token is invalid in Braintree
                            
                                Python Pandas 'apply' returns series; can't convert to dataframe
                            
                                align three time series in python
                            
                                Issues with username field in Python-social-auth
                            
                                why do you need "if instance is None" in __get__ of a descriptor class?
                            
                                mocking a function within a class method
                            
                                cosine similarity between two words in a list
                            
                                How to remove key from request QueryDict in Django?
                            
                                urllib2.quote does not work properly
                            
                                Changing the length of axis lines in matplotlib
                            
                                Merging multiple dataframes with non unique indexes
                            
                                Create LTI system in Python from state matrices using scipy.signal.lti
                            
                                How can I convert a .whl to an .egg?
                            
                                What's wrong with this Python mock patch?
                            
                                File paths hierarchial sort in python
                            
                                How to simulate timeout response
                            
                                pandas area plot interpolation / step style
                            
                                Difference between function and generator?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Distribution-type graphs (histogram/kde) with weighted data

Tags:

python

pandas

matplotlib

seaborn

bokeh

JohnE

People also ask

2 Answers

hitzg

Andres Romero

Recent Activity

Donate For Us