Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to smooth and plot x vs weighted average of y, weighted by x?

I have a dataframe with a column of weights and one of values. I'd need:

  • to discretise weights and, for each interval of weights, plot the weighted average of values, then
  • to extend the same logic to another variable: discretise z, and for each interval, plot the weighted average of values, weighted by weights

Is there an easy way to achieve this?I have found a way, but it seems a bit cumbersome:

  • I discretise the dataframe with pandas.cut()
  • do a groupby and calculate the weighted average
  • plot the mean of each bin vs the weighted average
  • I have also tried to smooth the curve with a spline, but it doesn't do much

Basically I'm looking for a better way to produce a more smoothed curve.

My output looks like this: enter image description here

and my code, with some random data, is:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.interpolate import make_interp_spline, BSpline

n=int(1e3)
df=pd.DataFrame()
np.random.seed(10)
df['w']=np.arange(0,n)
df['v']=np.random.randn(n)
df['ranges']=pd.cut(df.w, bins=50)
df['one']=1.
def func(x, df):
    # func() gets called within a lambda function; x is the row, df is the entire table
    b1= x['one'].sum()
    b2 = x['w'].mean()
    b3 = x['v'].mean()       
    b4=( x['w'] * x['v']).sum() / x['w'].sum() if x['w'].sum() >0 else np.nan

    cols=['# items','avg w','avg v','weighted avg v']
    return pd.Series( [b1, b2, b3, b4], index=cols )

summary = df.groupby('ranges').apply(lambda x: func(x,df))

sns.set(style='darkgrid')

fig,ax=plt.subplots(2)
sns.lineplot(summary['avg w'], summary['weighted avg v'], ax=ax[0])
ax[0].set_title('line plot')

xnew = np.linspace(summary['avg w'].min(), summary['avg w'].max(),100)
spl = make_interp_spline(summary['avg w'], summary['weighted avg v'], k=5) #BSpline object
power_smooth = spl(xnew)
sns.lineplot(xnew, power_smooth, ax=ax[1])
ax[1].set_title('not-so-interpolated plot')
like image 785
Pythonista anonymous Avatar asked Apr 01 '19 22:04

Pythonista anonymous


People also ask

How do you analyze a weighted average?

To find a weighted average, multiply each number by its weight, then add the results. If the weights don't add up to one, find the sum of all the variables multiplied by their weight, then divide by the sum of the weights.

What is the difference between moving average and weighted average?

The main difference between simple moving average, weighted moving average, and exponential moving average is the sensitivity that each shows to changes in the data used. SMA calculates the average price over a specific period, while WMA gives more weight to current data.


1 Answers

The first part of your question is rather easy to do.

I'm not sure what you mean with the second part. Do you want a (simplified) reproduction of your code or a new approach that better fits your need?

Anyway i had to look at your code to understand what you mean by weighting the values. I think people would normally expect something different from the term (just as a warning).

Here's the simplified version of your approach:

df['prod_v_w'] = df['v']*df['w']
weighted_avg_v = df.groupby(pd.cut(df.w, bins=50))[['prod_v_w','w']].sum()\
                   .eval('prod_v_w/w')
print(np.allclose(weighted_avg_v, summary['weighted avg v']))
Out[18]: True
like image 141
P.Tillmann Avatar answered Oct 21 '22 00:10

P.Tillmann