How to create a custom Kernel for a Gaussian Process Regressor in scikit-learn?

Tags:

scikit-learn

I am looking into using a GPR for a rather peculiar context, where I need to write my own Kernel. However I found out there's no documentation about how to do this. Trying to simply inherit from Kernel and implementing the methods __call__, get_params, diag and is_stationary is enough to get the fitting process to work, but then breaks down when I try to predict y values and standard deviations. What are the necessary steps to build a minimal but functional class that inherits from Kernel while using its own function? Thanks!

814

asked Mar 08 '18 17:03

Okarin

1 Answers

Depending on how exotic your kernel will be, the answer to your question may be different.

I find the implementation of the RBF kernel quite self-documenting, so I use it as reference. Here is the gist:

class RBF(StationaryKernelMixin, NormalizedKernelMixin, Kernel):
    def __init__(self, length_scale=1.0, length_scale_bounds=(1e-5, 1e5)):
        self.length_scale = length_scale
        self.length_scale_bounds = length_scale_bounds

    @property
    def hyperparameter_length_scale(self):
        if self.anisotropic:
            return Hyperparameter("length_scale", "numeric",
                                  self.length_scale_bounds,
                                  len(self.length_scale))
        return Hyperparameter(
            "length_scale", "numeric", self.length_scale_bounds)

    def __call__(self, X, Y=None, eval_gradient=False):
        # ...

As you mentioned, your kernel should inherit from Kernel, which requires you to implement __call__, diag and is_stationary. Note, that sklearn.gaussian_process.kernels provides StationaryKernelMixin and NormalizedKernelMixin, which implement diag and is_stationary for you (cf. RBF class definition in the code).

You should not overwrite get_params! This is done for you by the Kernel class and it expects scikit-learn kernels to follow a convention, which your kernel should too: specify your parameters in the signature of your constructor as keyword arguments (see length_scale in the previous example of RBF kernel). This ensures that your kernel can be copied, which is done by GaussianProcessRegressor.fit(...) (this could be the reason that you could not predict the standard deviation).

At this point, you may notice the other parameter length_scale_bounds. That is only a constraint on the actual hyper parameter length_scale (cf. constrained optimization). This brings us to the fact, that you need to also declare your hyper parameters, that you want optimized and need to compute gradients for in your __call__ implementation. You do that by defining a property of your class that is prefixed by hyperparameter_ (cf. hyperparameter_length_scale in the code). Each hyper parameter that is not fixed (fixed = hyperparameter.fixed == True) is returned by Kernel.theta, which is used by GP on fit() and to compute the marginal log likelihood. So this is essential if you want to fit parameters to your data.

One last detail about Kernel.theta, the implementation states:

Returns the (flattened, log-transformed) non-fixed hyperparameters.

So you should be careful with 0 values in your hyper parameters as they can end up as np.nan and break stuff.

I hope this helps, even though this question is already a bit old. I have actually never implemented a kernel myself, but was eager to skim the sklearn code base. It is unfortunate that there are no official tutorials on that, the code base, however, is quite clean and commented.

196

answered Oct 01 '22 22:10

mbornstein

Related questions
                            
                                Efficient nearest neighbour search for sparse matrices
                            
                                scikit-learn: fitting data into chunks vs fitting it all at once
                            
                                TypeError: unhashable type
                            
                                How to fit different inputs into an sklearn Pipeline?
                            
                                Undo L2 Normalization in sklearn python
                            
                                Error using sklearn and linear regression: shapes (1,16) and (1,1) not aligned: 16 (dim 1) != 1 (dim 0)
                            
                                Multiple classification models in a scikit pipeline python
                            
                                Loss, metrics, and scoring in Keras
                            
                                How to run GridSearchCV without cross-validation?
                            
                                python sklearn: what is the difference between accuracy_score and learning_curve score?
                            
                                Getting ValueError: y contains new labels when using scikit learn's LabelEncoder
                            
                                How does TfidfVectorizer compute scores on test data
                            
                                How to get best_estimator parameters from GridSearch using cross_val_score?
                            
                                How to parallelise .predict() method of a scikit-learn SVM (SVC) Classifier?
                            
                                GridSearchCV: performance metrics on a selected class [unbalanced data-set]
                            
                                Topic modelling - Assign a document with top 2 topics as category label - sklearn Latent Dirichlet Allocation
                            
                                Why is my VotingClassifier accuracy less than my individual classifier?
                            
                                Cannot import sklearn.model_selection in scikit-learn
                            
                                Logistic Regression: How to find top three feature that have highest weights?
                            
                                How to perform cluster with weights/density in python? Something like kmeans with weights?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With