Label Propagation in sklearn is classifying every vector as 1

Question

I have 2000 labelled data (7 different labels) and about 100K unlabeled data and I am trying to use sklearn.semi_supervised.LabelPropagation. The data has 1024 dimensions. My problem is that the classifier is labeling everything as 1. My code looks like this:

X_unlabeled = X_unlabeled[:10000, :]
X_both = np.vstack((X_train, X_unlabeled))
y_both = np.append(y_train, -np.ones((X_unlabeled.shape[0],)))
clf = LabelPropagation(max_iter=100).fit(X_both, y_both)
y_pred = clf.predict(X_test)

y_pred is all ones. Also, X_train is 2000x1024 and X_unlabeled is a subset of the unlabeled data which is 10000x1024.

I also get this error upon calling fit on the classifier:

/usr/local/lib/python2.7/site-packages/sklearn/semi_supervised/label_propagation.py:255: RuntimeWarning: invalid value encountered in divide self.label_distributions_ /= normalizer

chloe · Accepted Answer

Have you tried different values for the gamma parameter ? As the graph is constructed by computing an rbf kernel, the computation includes an exponential and the python exponential functions return 0 if the value is a too big negative number (see http://computer-programming-forum.com/56-python/ef71e144330ffbc2.htm). And if the graph is filled with 0, the label_distributions_ is filled with "nan" (because of normalization) and a warning appears. (be careful, the gamma value in scikit implementation is multiplied to the euclidean distance, it's not the same thing as in the Zhu paper.)

Label Propagation in sklearn is classifying every vector as 1

Tags:

machine-learning

scikit-learn

Andrew Danks

1 Answers

chloe

Recent Activity

Donate For Us

Label Propagation in sklearn is classifying every vector as 1

Tags:

machine-learning

scikit-learn

Andrew Danks

1 Answers

chloe

Related questions

Recent Activity

Donate For Us