I try to run following code. Btw, I am new to both python and sklearn.
import pandas as pd import numpy as np from sklearn.linear_model import LogisticRegression # data import and preparation trainData = pd.read_csv('train.csv') train = trainData.values testData = pd.read_csv('test.csv') test = testData.values X = np.c_[train[:, 0], train[:, 2], train[:, 6:7], train[:, 9]] X = np.nan_to_num(X) y = train[:, 1] Xtest = np.c_[test[:, 0:1], test[:, 5:6], test[:, 8]] Xtest = np.nan_to_num(Xtest) # model lr = LogisticRegression() lr.fit(X, y)
where y is a np.ndarray of 0's and 1's
I receive the following:
File "C:\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py", line >1174, in fit check_classification_targets(y)
File "C:\Anaconda3\lib\site-packages\sklearn\utils\multiclass.py", line 172, >in check_classification_targets raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'unknown'
from sklearn documentation: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.fit
y : array-like, shape (n_samples,) Target values (class labels in classification, real numbers in regression)
What is my error?
upd:
y is array([0.0, 1.0, 1.0, ..., 0.0, 1.0, 0.0], dtype=object) size is (891,)
The way to resolve this error is to simply convert the continuous values of the response variable to categorical values using the LabelEncoder() function from sklearn: What is this? Each of the original values is now encoded as a 0 or 1.
ValueError: Unknown label type: 'continuous' Essentially, the error is telling us that the type of the target variable is continuous which is not compatible with the specific model we are trying to fit (i.e. LogisticRegression ).
Your y
is of type object
, so sklearn cannot recognize its type. Add the line y=y.astype('int')
right after the line y = train[:, 1]
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With