Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LogisticRegression: Unknown label type: 'continuous' using sklearn in python

I have the following code to test some of most popular ML algorithms of sklearn python library:

import numpy as np from sklearn                        import metrics, svm from sklearn.linear_model           import LinearRegression from sklearn.linear_model           import LogisticRegression from sklearn.tree                   import DecisionTreeClassifier from sklearn.neighbors              import KNeighborsClassifier from sklearn.discriminant_analysis  import LinearDiscriminantAnalysis from sklearn.naive_bayes            import GaussianNB from sklearn.svm                    import SVC  trainingData    = np.array([ [2.3, 4.3, 2.5],  [1.3, 5.2, 5.2],  [3.3, 2.9, 0.8],  [3.1, 4.3, 4.0]  ]) trainingScores  = np.array( [3.4, 7.5, 4.5, 1.6] ) predictionData  = np.array([ [2.5, 2.4, 2.7],  [2.7, 3.2, 1.2] ])  clf = LinearRegression() clf.fit(trainingData, trainingScores) print("LinearRegression") print(clf.predict(predictionData))  clf = svm.SVR() clf.fit(trainingData, trainingScores) print("SVR") print(clf.predict(predictionData))  clf = LogisticRegression() clf.fit(trainingData, trainingScores) print("LogisticRegression") print(clf.predict(predictionData))  clf = DecisionTreeClassifier() clf.fit(trainingData, trainingScores) print("DecisionTreeClassifier") print(clf.predict(predictionData))  clf = KNeighborsClassifier() clf.fit(trainingData, trainingScores) print("KNeighborsClassifier") print(clf.predict(predictionData))  clf = LinearDiscriminantAnalysis() clf.fit(trainingData, trainingScores) print("LinearDiscriminantAnalysis") print(clf.predict(predictionData))  clf = GaussianNB() clf.fit(trainingData, trainingScores) print("GaussianNB") print(clf.predict(predictionData))  clf = SVC() clf.fit(trainingData, trainingScores) print("SVC") print(clf.predict(predictionData)) 

The first two works ok, but I got the following error in LogisticRegression call:

root@ubupc1:/home/ouhma# python stack.py  LinearRegression [ 15.72023529   6.46666667] SVR [ 3.95570063  4.23426243] Traceback (most recent call last):   File "stack.py", line 28, in <module>     clf.fit(trainingData, trainingScores)   File "/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/logistic.py", line 1174, in fit     check_classification_targets(y)   File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/multiclass.py", line 172, in check_classification_targets     raise ValueError("Unknown label type: %r" % y_type) ValueError: Unknown label type: 'continuous' 

The input data is the same as in the previous calls, so what is going on here?

And by the way, why there is a huge diference in the first prediction of LinearRegression() and SVR() algorithms (15.72 vs 3.95)?

like image 922
mllamazares Avatar asked Jan 29 '17 19:01

mllamazares


People also ask

How do you Fix Unknown label type continuous?

The way to resolve this error is to simply convert the continuous values of the response variable to categorical values using the LabelEncoder() function from sklearn: What is this? Each of the original values is now encoded as a 0 or 1.

What is unknown label type continuous?

In feature selection, if the target value is normalized (to number between one and zero) it gives the error value " Unknown label type: 'continuous' ". But if this target value is number other than the decimal between zero & 1 the program can work.


2 Answers

You are passing floats to a classifier which expects categorical values as the target vector. If you convert it to int it will be accepted as input (although it will be questionable if that's the right way to do it).

It would be better to convert your training scores by using scikit's labelEncoder function.

The same is true for your DecisionTree and KNeighbors qualifier.

from sklearn import preprocessing from sklearn import utils  lab_enc = preprocessing.LabelEncoder() encoded = lab_enc.fit_transform(trainingScores) >>> array([1, 3, 2, 0], dtype=int64)  print(utils.multiclass.type_of_target(trainingScores)) >>> continuous  print(utils.multiclass.type_of_target(trainingScores.astype('int'))) >>> multiclass  print(utils.multiclass.type_of_target(encoded)) >>> multiclass 
like image 131
Maximilian Peters Avatar answered Sep 17 '22 03:09

Maximilian Peters


LogisticRegression is not for regression but classification !

The Y variable must be the classification class,

(for example 0 or 1)

And not a continuous variable,

that would be a regression problem.

like image 22
Tomas G. Avatar answered Sep 18 '22 03:09

Tomas G.