Reproducing R's gaussian process maximum likelihood regression in Python

Question

I have implemented a function in R to estimate the Gaussian Process parameters of a basic sin function. Unfortunately the project has to be made in Python and I have been trying to reproduce the behavior of R library's hetGP in python using SKlearn but I have a hard time mapping the former to the later.

My understanding of Gaussian Processes is still limited and I am a beginner with sklearn, so I would really appreciate some help on this one.

My R code:

library(hetGP)

set.seed(123)
nvar <- 2
n <- 400
r <- 1
f <- function(x) sin(sum(x))
true_C <- matrix(1/8 * (3 + 2 * cos(2) - cos(4)), nrow = 2, ncol = 2)

design <- matrix(runif(nvar*n), ncol = nvar)
response <- apply(design, 1, f)
model <- mleHomGP(design, response, lower = rep(1e-4, nvar), upper = rep(1,nvar))

Later in the code, I use model$Ki and model$theta

model$theta: 0.9396363 0.9669170
dim(model$ki): 400 400

My Python code so far:

import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF

n = 400
n_var = 2
real_c = np.full((2, 2), 1 / 8 * (3 + 2 * np.cos(2) - np.cos(4)))
design = np.random.uniform(size=n * n_var).reshape(-1, 2)
test = np.random.uniform(size=n * n_var).reshape(-1, 2)
response = np.apply_along_axis(lambda x: np.sin(np.sum(x)), 1, design)
kernel = RBF(length_scale=(1, 1))
gpr = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10,
                               optimizer="fmin_l_bfgs_b").fit(design, response)
gpr.predict(test, return_std=True)
theta = gpr.kernel_.get_params()["length_scale"]
#theta = gpr.kernel_.theta
k_inv = gpr._K_inv

theta = [1.78106558 1.80083585]

Alexandre Senges · Accepted Answer

After more than a week, I finally found an answer.

(By looking at `Scikit-learn` and `hetGP` implementations)

There are many different points in their implementations.

First, the noise level as well as sigma have to be explicitly instantiated in sklearn or they won't be optimized.
Furthermore, the best way to find k_inv is to use GaussianProcessRegressor.L_, the lower part of the Cholesky decomposition of K.
Finally hetGP doesn't scale their K by sigma, so we have to do it manualy.

How I did:

import scipy
import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import ConstantKernel, RBF, WhiteKernel


n = 400
n_var = 2

design = np.random.uniform(size=n * n_var).reshape((-1, n_var))
response = np.array([np.sin(np.sum(row)) for row in design])

real_c = np.full((n_var, n_var), 1 / 8 * (3 + 2 * np.cos(2) - np.cos(4)))

# Kernel has to have a constant kernel for sigma, a n_var dimension length_scale and a white kernel to optimize the noise
kernel = ConstantKernel(1e-8) * RBF(length_scale=np.array([1.0, 1.0])) + WhiteKernel(noise_level=1)

gpr = GaussianProcessRegressor(kernel=kernel, alpha=1e-10)    
gpr.fit(design, response)

L_inv = scipy.linalg.solve_triangular(gpr.L_.T, np.eye(gpr.L_.shape[0]))
k_inv = L_inv.dot(L_inv.T)
sigma_f = gpr.kernel_.k1.get_params()['k1__constant_value']

theta = gpr.kernel_.k1.get_params()['k2__length_scale']

Ki = k_inv * sigma_f

Reproducing R's gaussian process maximum likelihood regression in Python

Tags:

python

r

scikit-learn

gaussian

Alexandre Senges

1 Answers

After more than a week, I finally found an answer.

(By looking at `Scikit-learn` and `hetGP` implementations)

How I did:

Alexandre Senges

Recent Activity

Donate For Us

Reproducing R's gaussian process maximum likelihood regression in Python

Tags:

python

r

scikit-learn

gaussian

Alexandre Senges

1 Answers

After more than a week, I finally found an answer.

(By looking at Scikit-learn and hetGP implementations)

How I did:

Alexandre Senges

Related questions

Recent Activity

Donate For Us

(By looking at `Scikit-learn` and `hetGP` implementations)