When using LabelPropagation, I often run into this warning (imho it should be an error because it completely fails the propagation):
/usr/local/lib/python3.5/dist-packages/sklearn/semi_supervised/label_propagation.py:279: RuntimeWarning: invalid value encountered in true_divide self.label_distributions_ /= normalizer
So after few tries with the RBF kernel, I discovered the paramater gamma
has an influence.
The problem comes from these lines:
if self._variant == 'propagation':
normalizer = np.sum(
self.label_distributions_, axis=1)[:, np.newaxis]
self.label_distributions_ /= normalizer
I don't get how label_distributions_ can be all zeros, especially when its definition is:
self.label_distributions_ = safe_sparse_dot(
graph_matrix, self.label_distributions_)
Gamma has an influence on the graph_matrix (because graph_matrix is the result of _build_graph() that call the kernel function). OK. But still. Something's wrong
I remind you how graph weights are computed for the propagation: W = exp(-gamma * D), D the pairwise distance matrix between all points of the dataset.
The problem is: np.exp(x)
returns 0.0 if x very small.
Let's imagine we have two points i
and j
such that dist(i, j) = 10
.
>>> np.exp(np.asarray(-10*40, dtype=float)) # gamma = 40 => OKAY
1.9151695967140057e-174
>>> np.exp(np.asarray(-10*120, dtype=float)) # gamma = 120 => NOT OKAY
0.0
In practice, I'm not setting gamma manually but I'm using the method described in this paper (section 2.4).
The only way I can think of is to normalize the dataset in every dimension, but we lose some geometric/topologic property of the dataset (a 2x10 rectangle becoming a 1x1 square for example)
In this example, it's worst: even with gamma = 20 it fails.
In [11]: from sklearn.semi_supervised.label_propagation import LabelPropagation
In [12]: import numpy as np
In [13]: X = np.array([[0, 0], [0, 10]])
In [14]: Y = [0, -1]
In [15]: LabelPropagation(kernel='rbf', tol=0.01, gamma=20).fit(X, Y)
/usr/local/lib/python3.5/dist-packages/sklearn/semi_supervised/label_propagation.py:279: RuntimeWarning: invalid value encountered in true_divide
self.label_distributions_ /= normalizer
/usr/local/lib/python3.5/dist-packages/sklearn/semi_supervised/label_propagation.py:290: ConvergenceWarning: max_iter=1000 was reached without convergence.
category=ConvergenceWarning
Out[15]:
LabelPropagation(alpha=None, gamma=20, kernel='rbf', max_iter=1000, n_jobs=1,
n_neighbors=7, tol=0.01)
In [16]: LabelPropagation(kernel='rbf', tol=0.01, gamma=2).fit(X, Y)
Out[16]:
LabelPropagation(alpha=None, gamma=2, kernel='rbf', max_iter=1000, n_jobs=1,
n_neighbors=7, tol=0.01)
In [17]:
Basically you're doing a softmax
function, right?
The general way to prevent softmax
from over/underflowing is (from here)
# Instead of this . . .
def softmax(x, axis = 0):
return np.exp(x) / np.sum(np.exp(x), axis = axis, keepdims = True)
# Do this
def softmax(x, axis = 0):
e_x = np.exp(x - np.max(x, axis = axis, keepdims = True))
return e_x / e_x.sum(axis, keepdims = True)
This bounds e_x
between 0 and 1, and assures one value of e_x
will always be 1
(namely the element np.argmax(x)
). This prevents overflow and underflow (when np.exp(x.max())
is either bigger or smaller than float64
can handle).
In this case, as you can't change the algorithm, I would take the input D
and make D_ = D - D.min()
as this should be numerically equivalent to the above, as W.max()
should be -gamma * D.min()
(as you're just flipping the sign). The do your algorithm with regards to D_
EDIT:
As recommended by @PaulBrodersen below, you can build a "safe" rbf kernel based on the sklearn
implementation here:
def rbf_kernel_safe(X, Y=None, gamma=None):
X, Y = sklearn.metrics.pairwise.check_pairwise_arrays(X, Y)
if gamma is None:
gamma = 1.0 / X.shape[1]
K = sklearn.metrics.pairwise.euclidean_distances(X, Y, squared=True)
K *= -gamma
K -= K.max()
np.exp(K, K) # exponentiate K in-place
return K
And then use it in your propagation
LabelPropagation(kernel = rbf_kernel_safe, tol = 0.01, gamma = 20).fit(X, Y)
Unfortunately I only have v0.18
, which doesn't accept user-defined kernel functions for LabelPropagation
, so I can't test it.
EDIT2:
Checking your source for why you have such large gamma
values makes me wonder if you are using gamma = D.min()/3
, which would be incorrect. The definition is sigma = D.min()/3
while the definition of sigma
in w
is
w = exp(-d**2/sigma**2) # Equation (1)
which would make the correct gamma
value 1/sigma**2
or 9/D.min()**2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With