I have a dataframe with a column of weights and one of values. I'd need:
Is there an easy way to achieve this?I have found a way, but it seems a bit cumbersome:
Basically I'm looking for a better way to produce a more smoothed curve.
My output looks like this:
and my code, with some random data, is:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.interpolate import make_interp_spline, BSpline
n=int(1e3)
df=pd.DataFrame()
np.random.seed(10)
df['w']=np.arange(0,n)
df['v']=np.random.randn(n)
df['ranges']=pd.cut(df.w, bins=50)
df['one']=1.
def func(x, df):
# func() gets called within a lambda function; x is the row, df is the entire table
b1= x['one'].sum()
b2 = x['w'].mean()
b3 = x['v'].mean()
b4=( x['w'] * x['v']).sum() / x['w'].sum() if x['w'].sum() >0 else np.nan
cols=['# items','avg w','avg v','weighted avg v']
return pd.Series( [b1, b2, b3, b4], index=cols )
summary = df.groupby('ranges').apply(lambda x: func(x,df))
sns.set(style='darkgrid')
fig,ax=plt.subplots(2)
sns.lineplot(summary['avg w'], summary['weighted avg v'], ax=ax[0])
ax[0].set_title('line plot')
xnew = np.linspace(summary['avg w'].min(), summary['avg w'].max(),100)
spl = make_interp_spline(summary['avg w'], summary['weighted avg v'], k=5) #BSpline object
power_smooth = spl(xnew)
sns.lineplot(xnew, power_smooth, ax=ax[1])
ax[1].set_title('not-so-interpolated plot')
To find a weighted average, multiply each number by its weight, then add the results. If the weights don't add up to one, find the sum of all the variables multiplied by their weight, then divide by the sum of the weights.
The main difference between simple moving average, weighted moving average, and exponential moving average is the sensitivity that each shows to changes in the data used. SMA calculates the average price over a specific period, while WMA gives more weight to current data.
The first part of your question is rather easy to do.
I'm not sure what you mean with the second part. Do you want a (simplified) reproduction of your code or a new approach that better fits your need?
Anyway i had to look at your code to understand what you mean by weighting the values. I think people would normally expect something different from the term (just as a warning).
Here's the simplified version of your approach:
df['prod_v_w'] = df['v']*df['w']
weighted_avg_v = df.groupby(pd.cut(df.w, bins=50))[['prod_v_w','w']].sum()\
.eval('prod_v_w/w')
print(np.allclose(weighted_avg_v, summary['weighted avg v']))
Out[18]: True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With