How to smooth and plot x vs weighted average of y, weighted by x?

Tags:

I have a dataframe with a column of weights and one of values. I'd need:

to discretise weights and, for each interval of weights, plot the weighted average of values, then
to extend the same logic to another variable: discretise z, and for each interval, plot the weighted average of values, weighted by weights

Is there an easy way to achieve this?I have found a way, but it seems a bit cumbersome:

I discretise the dataframe with pandas.cut()
do a groupby and calculate the weighted average
plot the mean of each bin vs the weighted average
I have also tried to smooth the curve with a spline, but it doesn't do much

Basically I'm looking for a better way to produce a more smoothed curve.

My output looks like this: enter image description here

and my code, with some random data, is:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.interpolate import make_interp_spline, BSpline

n=int(1e3)
df=pd.DataFrame()
np.random.seed(10)
df['w']=np.arange(0,n)
df['v']=np.random.randn(n)
df['ranges']=pd.cut(df.w, bins=50)
df['one']=1.
def func(x, df):
    # func() gets called within a lambda function; x is the row, df is the entire table
    b1= x['one'].sum()
    b2 = x['w'].mean()
    b3 = x['v'].mean()       
    b4=( x['w'] * x['v']).sum() / x['w'].sum() if x['w'].sum() >0 else np.nan

    cols=['# items','avg w','avg v','weighted avg v']
    return pd.Series( [b1, b2, b3, b4], index=cols )

summary = df.groupby('ranges').apply(lambda x: func(x,df))

sns.set(style='darkgrid')

fig,ax=plt.subplots(2)
sns.lineplot(summary['avg w'], summary['weighted avg v'], ax=ax[0])
ax[0].set_title('line plot')

xnew = np.linspace(summary['avg w'].min(), summary['avg w'].max(),100)
spl = make_interp_spline(summary['avg w'], summary['weighted avg v'], k=5) #BSpline object
power_smooth = spl(xnew)
sns.lineplot(xnew, power_smooth, ax=ax[1])
ax[1].set_title('not-so-interpolated plot')

785

asked Apr 01 '19 22:04

Pythonista anonymous

1 Answers

The first part of your question is rather easy to do.

I'm not sure what you mean with the second part. Do you want a (simplified) reproduction of your code or a new approach that better fits your need?

Anyway i had to look at your code to understand what you mean by weighting the values. I think people would normally expect something different from the term (just as a warning).

Here's the simplified version of your approach:

df['prod_v_w'] = df['v']*df['w']
weighted_avg_v = df.groupby(pd.cut(df.w, bins=50))[['prod_v_w','w']].sum()\
                   .eval('prod_v_w/w')
print(np.allclose(weighted_avg_v, summary['weighted avg v']))
Out[18]: True

141

answered Oct 21 '22 00:10

P.Tillmann

Related questions
                            
                                Installing Anaconda while having Python 3.7 already installed
                            
                                How to avoid calling latex in matplotlib (output to pgf)
                            
                                Difference between using 'and' and using '&' in Django ORM
                            
                                Data file saved only temporarily when using Pyinstaller executable
                            
                                Container localhost does not exist error when using Keras + Flask Blueprints
                            
                                Transparent window with blur behind with pyqt
                            
                                Can I use a machine learning model as the objective function in an optimization problem?
                            
                                Equivalent Python code for mutate_if from tidyverse
                            
                                py2neo - The client is unauthorized due to authentication failure
                            
                                Why do two sub-processes stop each other from working?
                            
                                Merging duplicate columns while reading CSV file
                            
                                Setting up Python Conda Environment in Heroku
                            
                                How to display multiple annotations in Seaborn Heatmap cells?
                            
                                Gcloud update broke my app -- GCP Python 2.7
                            
                                Django pass Haystack highlighter result to a view
                            
                                Difference between 3D-tensor and 4D-tensor for images input of DL Keras framework
                            
                                Is it possible to generate gremlin queries from bytecode in python
                            
                                Is there anyway I can set the working directory in airflow where my codes will run?
                            
                                Docker container fails to run, Error : python3: can't open file 'flask run --host=0.0.0.0': [Errno 2] No such file or directory
                            
                                Why does Pillow convert return colours outside the specified palette?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to smooth and plot x vs weighted average of y, weighted by x?

Tags:

python

pandas

matplotlib

pandas-groupby

weighted-average

Pythonista anonymous

People also ask

1 Answers

P.Tillmann

Recent Activity

Donate For Us