How to fit a locally weighted regression in python so that it can be used to predict on new data?
There is statsmodels.nonparametric.smoothers_lowess.lowess
, but it returns the estimates only for the original data set; so it seems to only do fit
and predict
together, rather than separately as I expected.
scikit-learn
always has a fit
method that allows the object to be used later on new data with predict
; but it doesn't implement lowess
.
lowess is for adding a smooth curve to a scatterplot, i.e., for univariate smoothing. loess is for fitting a smooth surface to multivariate data. Both algorithms use locally-weighted polynomial regression, usually with robustifying iterations.
LOWESS (Locally Weighted Scatterplot Smoothing), sometimes called LOESS (locally weighted smoothing), is a popular tool used in regression analysis that creates a smooth line through a timeplot or scatter plot to help you to see relationship between variables and foresee trends.
LOESS is based on the ideas that any function can be well approximated in a small neighborhood by a low-order polynomial and that simple models can be fit to data easily. High-degree polynomials would tend to overfit the data in each subset and are numerically unstable, making accurate computations difficult.
LOESS and LOWESS filters are very popular smoothing methods that use a locally weighted regression function. This method uses a weighting function with the effect that the influence of a neighboring value on the smoothed value at a certain position decreases with their distance to that position.
Lowess works great for predicting (when combined with interpolation)! I think the code is pretty straightforward-- let me know if you have any questions! Matplolib Figure
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.interpolate import interp1d
import statsmodels.api as sm
# introduce some floats in our x-values
x = list(range(3, 33)) + [3.2, 6.2]
y = [1,2,1,2,1,1,3,4,5,4,5,6,5,6,7,8,9,10,11,11,12,11,11,10,12,11,11,10,9,8,2,13]
# lowess will return our "smoothed" data with a y value for at every x-value
lowess = sm.nonparametric.lowess(y, x, frac=.3)
# unpack the lowess smoothed points to their values
lowess_x = list(zip(*lowess))[0]
lowess_y = list(zip(*lowess))[1]
# run scipy's interpolation. There is also extrapolation I believe
f = interp1d(lowess_x, lowess_y, bounds_error=False)
xnew = [i/10. for i in range(400)]
# this this generate y values for our xvalues by our interpolator
# it will MISS values outsite of the x window (less than 3, greater than 33)
# There might be a better approach, but you can run a for loop
#and if the value is out of the range, use f(min(lowess_x)) or f(max(lowess_x))
ynew = f(xnew)
plt.plot(x, y, 'o')
plt.plot(lowess_x, lowess_y, '*')
plt.plot(xnew, ynew, '-')
plt.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With