I am using the scikit-learn implementation of Gaussian Process Regression here and I want to fit single points instead of fitting a whole set of points. But the resulting alpha coefficients should remain the same e.g.
gpr2 = GaussianProcessRegressor()
for i in range(x.shape[0]):
gpr2.fit(x[i], y[i])
should be the same as
gpr = GaussianProcessRegressor().fit(x, y)
But when accessing gpr2.alpha_ and gpr.alpha_, they are not the same. Why is that?
Indeed, I am working on a project where new data points arise. I dont want to append the x, y arrays and fit on the whole dataset again as it is very time intense. Let x be of size n, then I am having:
n+(n-1)+(n-2)+...+1 € O(n^2) fittings
when considering that the fitting itself is quadratic (correct me if I'm wrong), the run time complexity should be in O(n^3). It would be more optimal, if I do a single fitting on n points:
1+1+...+1 = n € O(n)
What you refer to is actually called online learning or incremental learning; it is in itself a huge sub-field in machine learning, and is not available out-of-the-box for all scikit-learn models. Quoting from the relevant documentation:
Although not all algorithms can learn incrementally (i.e. without seeing all the instances at once), all estimators implementing the
partial_fitAPI are candidates. Actually, the ability to learn incrementally from a mini-batch of instances (sometimes called “online learning”) is key to out-of-core learning as it guarantees that at any given time there will be only a small amount of instances in the main memory.
Following this excerpt in the linked document above, there is a complete list of all scikit-learn models currently supporting incremental learning, from where you can see that GaussianProcessRegressor is not one of them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With