Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SKlearn: Gaussian Process Regression not changed during learning

I'm trying to fit a GP using a GaussianProcessRegressor, and I notice that my hyperparameters are still at their initial values. I did some stepping in gpr.py, but wasn't able to pinpoint the exact reason for this. Prediction with the initial values yields a zero line.

My data consists 5400 samples, each having 12 features, mapped to a single output variable. Even though the design might not be that well, I still would expect some learning.

Required files:

features.txt

output.txt

import pandas as pd
import numpy as np
import time
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel,WhiteKernel

designmatrix = pd.read_csv('features.txt', index_col = 0)
y = pd.read_csv('output.txt', header=None, index_col = 0)

# The RBF kernel is a stationary kernel. It is also known as the “squared exponential” kernel. 
# It is parameterized by a length-scale parameter length_scale>0, which can either be a scalar (isotropic variant of the kernel) 
# or a vector with the same number of dimensions as the inputs X (anisotropic variant of the kernel). 
# 
# The ConstantKernel can be used as part of a product-kernel where it scales the magnitude of the other factor (kernel) or as 
# part of a sum-kernel, where it modifies the mean of the Gaussian process.
#
# The main use-case of the White kernel is as part of a sum-kernel where it explains the noise-component of the signal. 
# Tuning its parameter corresponds to estimating the noise-level: k(x_1, x_2) = noise_level if x_1 == x_2 else 0

kernel = ConstantKernel(0.1, (1e-23, 1e5)) * 
RBF(0.1*np.ones(designmatrix.shape[1]), (1e-23, 1e10) ) + WhiteKernel(0.1, (1e-23, 1e5))

gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=0)

print('Training')
t = time.time()
gp = gp.fit(designmatrix, y)
elapsed = time.time() - t
print(elapsed)

score = gp.score(designmatrix, y)
print(score)

print("initial params")
params = gp.get_params()
print(params)
print("learned kernel params")
print(gp.kernel_.get_params())

The result is the following:

initial params

{'alpha': 1e-10, 'copy_X_train': True, 'kernel__k1': 1**2, 'kernel__k2': RBF(len
gth_scale=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), 'kernel__k1__constant_value': 1
.0, 'kernel__k1__constant_value_bounds': (1e-05, 100000.0), 'kernel__k2__length_
scale': array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.]), 'ke
rnel__k2__length_scale_bounds': (1e-05, 100000.0), 'kernel': 1**2 * RBF(length_s
cale=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), 'n_restarts_optimizer': 0, 'normaliz
e_y': False, 'optimizer': 'fmin_l_bfgs_b', 'random_state': None}

learned kernel params

{'k1': 1**2, 'k2': RBF(length_scale=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), 'k1__
constant_value': 1.0, 'k1__constant_value_bounds': (1e-05, 100000.0), 'k2__lengt
h_scale': array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.]), '
k2__length_scale_bounds': (1e-05, 100000.0)}

So, the kernel parameters are unchanged...

  • Is there a way to check for warnings?

  • Am I doing something wrong, or is there something I could check?

Any help would be really appreciated...

Ben

like image 459
Ben Avatar asked Mar 06 '23 12:03

Ben


1 Answers

NOT AN ANSWER (YET)

Begin Note

The data is too big for an SO question, and it takes too long for us to test your issue. I've changed your code to include only the first 600 lines from each file. Your code the way pasted here also doesn't run, I've fixed that.

End Note

Using python 3.6.4, scikit-learn==0.19.1, and numpy==1.14.2.

As you see in the documentation of n_restarts_optimizer, you need to have it larger than 0 if you want to optimize the kernel hyperparameters.\

n_restarts_optimizer : int, optional (default: 0)
    The number of restarts of the optimizer for finding the kernel's
    parameters which maximize the log-marginal likelihood. The first run
    of the optimizer is performed from the kernel's initial parameters,
    the remaining ones (if any) from thetas sampled log-uniform randomly
    from the space of allowed theta-values. If greater than 0, all bounds
    must be finite. Note that n_restarts_optimizer == 0 implies that one
    run is performed.

So changing the value to 2 from 0 in your code, results in the following output:

import pandas as pd
import numpy as np
import time
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel,WhiteKernel

designmatrix = pd.read_csv('features.txt', index_col = 0).iloc[0:600,]
y = pd.read_csv('output.txt', header=None, index_col = 0).iloc[0:600,]

# The RBF kernel is a stationary kernel. It is also known as the “squared exponential” kernel. 
# It is parameterized by a length-scale parameter length_scale>0, which can either be a scalar (isotropic variant of the kernel) 
# or a vector with the same number of dimensions as the inputs X (anisotropic variant of the kernel). 
# 
# The ConstantKernel can be used as part of a product-kernel where it scales the magnitude of the other factor (kernel) or as 
# part of a sum-kernel, where it modifies the mean of the Gaussian process.
#
# The main use-case of the White kernel is as part of a sum-kernel where it explains the noise-component of the signal. 
# Tuning its parameter corresponds to estimating the noise-level: k(x_1, x_2) = noise_level if x_1 == x_2 else 0

kernel = ConstantKernel(0.1, (1e-23, 1e5)) * \
         RBF(0.1*np.ones(designmatrix.shape[1]), (1e-23, 1e10) ) + \
         WhiteKernel(0.1, (1e-23, 1e5))

gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=2)

print("initial params")
params = gp.get_params()
print(params)

print('Training')
t = time.time()
gp = gp.fit(designmatrix, y)
elapsed = time.time() - t
print(elapsed)

score = gp.score(designmatrix, y)
print(score)

print("learned kernel params")
print(gp.kernel_.get_params())

And the output:

initial params
{'alpha': 1e-10, 'copy_X_train': True, 'kernel__k1': 0.316**2 * RBF(length_scale=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]), 'kernel__k2': WhiteKernel(noise_level=0.1), 'kernel__k1__k1': 0.316**2, 'kernel__k1__k2': RBF(length_scale=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]), 'kernel__k1__k1__constant_value': 0.1, 'kernel__k1__k1__constant_value_bounds': (1e-23, 100000.0), 'kernel__k1__k2__length_scale': array([0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]), 'kernel__k1__k2__length_scale_bounds': (1e-23, 10000000000.0), 'kernel__k2__noise_level': 0.1, 'kernel__k2__noise_level_bounds': (1e-23, 100000.0), 'kernel': 0.316**2 * RBF(length_scale=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]) + WhiteKernel(noise_level=0.1), 'n_restarts_optimizer': 2, 'normalize_y': False, 'optimizer': 'fmin_l_bfgs_b', 'random_state': None}
Training
3.9108407497406006
1.0
learned kernel params
{'k1': 20.3**2 * RBF(length_scale=[0.00289, 9.29e-15, 8.81e-20, 0.00165, 2.7e+08, 3.2e+06, 0.233, 5.62e+07, 8.78e+07, 0.0169, 4.88e-21, 3.23e-20]), 'k2': WhiteKernel(noise_level=2.17e-13), 'k1__k1': 20.3**2, 'k1__k2': RBF(length_scale=[0.00289, 9.29e-15, 8.81e-20, 0.00165, 2.7e+08, 3.2e+06, 0.233, 5.62e+07, 8.78e+07, 0.0169, 4.88e-21, 3.23e-20]), 'k1__k1__constant_value': 411.28699807005, 'k1__k1__constant_value_bounds': (1e-23, 100000.0), 'k1__k2__length_scale': array([2.88935323e-03, 9.29401433e-15, 8.81112330e-20, 1.64832813e-03,
       2.70454686e+08, 3.20194179e+06, 2.32646715e-01, 5.62487948e+07,
       8.77636837e+07, 1.68642019e-02, 4.88384874e-21, 3.22536538e-20]), 'k1__k2__length_scale_bounds': (1e-23, 10000000000.0), 'k2__noise_level': 2.171274720012903e-13, 'k2__noise_level_bounds': (1e-23, 100000.0)}

Could you please edit your question such that your observation can be reproduced?

like image 109
adrin Avatar answered Apr 01 '23 20:04

adrin