I am partitioning 500 samples out a 10,000+ row dataset just for sake of simplicity. Please copy and paste X and y into your IDE.
X =
array([ -8.93, -0.17, 1.47, -6.13, -4.06, -2.22, -2.11, -0.25,
0.25, 0.49, 1.7 , -0.77, 1.07, 5.61, -11.95, -3.8 ,
-3.42, -2.55, -2.44, -1.99, -1.7 , -0.98, -0.91, -0.91,
-0.25, 1.7 , 2.88, -6.9 , -4.07, -1.35, -0.33, 0.63,
0.98, -3.31, -2.61, -2.61, -2.17, -1.38, -0.77, -0.25,
-0.08, -1.2 , -3.1 , -1.07, -0.7 , -0.41, -0.33, 0.41,
0.77, 0.77, 1.14, 2.17, -7.92, -3.8 , -2.11, -2.06,
-1.2 , -1.14, 0. , 0.56, 1.47, -1.99, -0.17, 2.44,
-5.87, -3.74, -3.37, -2.88, -0.49, -0.25, -0.08, 0.33,
0.33, 0.84, 1.64, 2.06, 2.88, -4.58, -1.82, -1.2 ,
0.25, 0.25, 0.63, 2.61, -5.36, -1.47, -0.63, 0. ,
0.63, 1.99, 1.99, -10.44, -2.55, 0.33, -8.93, -5.87,
-5.1 , -2.78, -0.25, 1.47, 1.93, 2.17, -5.36, -5.1 ,
-3.48, -2.44, -2.06, -2.06, -1.82, -1.58, -1.58, -0.63,
-0.33, 0. , 0.17, -3.31, -0.25, -5.1 , -3.8 , -2.55,
-1.99, -1.7 , -0.98, -0.91, -0.63, -0.25, 0.77, 0.91,
0.91, -9.43, -8.42, -2.72, -2.55, -1.26, 0.7 , 0.77,
1.07, 1.47, 1.7 , -1.82, -1.47, 0.17, 1.26, -5.36,
-1.52, -1.47, -0.17, -3.48, -3.31, -2.06, -1.47, 0.17,
0.25, 1.7 , 2.5 , -9.94, -6.08, -5.87, -3.37, -2.44,
-2.17, -1.87, -0.98, -0.7 , -0.49, 0.41, 1.47, 2.28,
-14.95, -12.44, -6.39, -4.33, -3.8 , -2.72, -2.17, -1.2 ,
0.41, 0.77, 0.84, 2.51, -1.99, -1.7 , -1.47, -1.2 ,
0.49, 0.63, 0.84, 0.98, 1.14, 2.5 , -2.06, -1.26,
-0.33, 0.17, 4.58, -7.41, -5.87, 1.2 , 1.38, 1.58,
1.82, 1.99, -6.39, -2.78, -2.67, -1.87, -1.58, -1.47,
0.84, -10.44, -7.41, -3.05, -2.17, -1.07, -1.07, -0.91,
0.25, 1.82, 2.88, -6.9 , -1.47, 0.33, -8.42, -3.8 ,
-1.99, -1.47, -1.47, -0.56, 0.17, 0.17, 0.25, 0.56,
4.58, -3.48, -2.61, -2.44, -0.7 , 0.63, 1.47, 1.82,
-13.96, -9.43, -2.67, -1.38, -0.08, 0. , 1.82, 3.05,
-4.58, -3.31, -0.98, -0.91, -0.7 , 0.77, -0.7 , -0.33,
0.56, 1.58, 1.7 , 2.61, -4.84, -4.84, -4.32, -2.88,
-1.38, -0.98, -0.17, 0.17, 0.49, 2.44, 4.32, -3.48,
-3.05, 0.56, -8.42, -3.48, -2.61, -2.61, -2.06, -1.47,
-0.98, 0. , 0.08, 1.38, 1.93, -9.94, -2.72, -1.87,
-1.2 , -1.07, 1.58, 4.58, -6.64, -2.78, -0.77, -0.7 ,
-0.63, 0.49, 1.07, -8.93, -4.84, -1.7 , 1.76, 3.31,
-11.95, -3.16, -3.05, -1.82, -0.49, -0.41, 0.56, 1.58,
-13.96, -3.05, -2.78, -2.55, -1.7 , -1.38, -0.91, -0.33,
1.2 , 1.32, 1.47, -2.06, -1.82, -7.92, -6.33, -4.32,
-3.8 , -1.93, -1.52, -0.98, -0.49, -0.33, 0.7 , 1.52,
1.76, -8.93, -7.41, -2.88, -2.61, -2.33, -1.99, -1.82,
-1.64, -0.84, 1.07, 2.06, -3.96, -2.44, -1.58, 0. ,
-3.31, -2.61, -1.58, -0.25, 0.33, 0.56, 0.84, 1.07,
-1.58, -0.25, 1.35, -1.99, -1.7 , -1.47, -1.47, -0.84,
-0.7 , -0.56, -0.33, 0.56, 0.63, 1.32, 2.28, 2.28,
-2.72, -0.25, 0.41, -6.9 , -4.42, -4.32, -1.76, -1.2 ,
-1.14, -1.07, 0.56, 1.32, 1.52, -14.97, -7.41, -5.1 ,
-2.61, -1.93, -0.98, 0.17, 0.25, 0.41, -4.42, -2.61,
-0.91, -0.84, 2.39, -2.61, -1.32, 0.41, -6.9 , -5.61,
-4.06, -3.31, -1.47, -0.91, -0.7 , -0.63, 0.33, 1.38,
2.61, -2.29, 3.06, 4.44, -10.94, -4.32, -3.42, -2.17,
-1.7 , -1.47, -1.32, -1.07, -0.7 , 0. , 0.77, 1.07,
-3.31, -2.88, -2.61, -1.47, -1.38, -0.63, -0.49, 1.07,
1.52, -3.8 , -1.58, -0.91, -0.7 , 0.77, 3.42, -8.42,
-2.88, -1.76, -1.76, -0.63, -0.25, 0.49, 0.63, -6.9 ,
-4.06, -1.82, -1.76, -1.76, -1.38, -0.91, -0.7 , 0.17,
1.38, 1.47, 1.47, -11.95, -0.98, -0.56, -14.97, -9.43,
-8.93, -2.72, -2.61, -1.64, -1.32, -0.56, -0.49, 0.91,
1.2 , 1.47, -3.8 , -3.06, -2.51, -1.04, -0.33, -0.33,
-3.31, -3.16, -3.05, -2.61, -1.47, -1.07, 2.17, 3.1 ,
-2.61, -0.25, -3.85, -2.44])
y =
array([1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0,
1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1,
0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1,
1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0,
0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1,
1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1,
1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0,
0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1,
1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1,
1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0,
0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1,
1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0,
1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,
0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0,
1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1,
1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1,
0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0,
0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0,
0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0,
0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0,
1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1])
Initialization & Training:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X, y)
Cross-validate:
from sklearn.model_selection import cross_val_score
cross_val_score(model, X, y, cv=10, scoring='r2').mean()
-0.3339677563815496 (Negative R2?)
To see if it's close to true R2 of model. I did this:
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=None, shuffle=False)
r2_score(y_test, model.predict_proba(X_test)[:,1], multioutput='variance_weighted')
0.32642659661798396
This R2 makes more sense for goodness-of-fit of the model, and it looks like the two R2 is a just +/- sign switch, but it is not. In my model using a much bigger sample, R2 cross-val is -0.24 and R2 test is 0.18. And, when I add a feature that seems to benefit model, R2 test goes up and R2 cross-val decreases
Also, if you switch LogisticRegression to LinearRegression, R2 cross-val is now positive and is close to R2 test. What is causing this issue?
TLDR: R2 can be negative and you are misinterpeting the train_test_split
results.
I will explain both statements below.
cross_val_score
sign flipping for error
and loss
metricsFrom the docs, you can see that cross_val_score
actually flips the sign for some metrics. But only for error
or loss
metrics (lower-is-better), not for score
metrics (higher-is-better):
All scorer objects follow the convention that higher return values are better than lower return values. Thus metrics which measure the distance between the model and the data, like metrics.mean_squared_error, are available as neg_mean_squared_error which return the negated value of the metric.
Since r2
is a score
metric, it's not flipping the sign. You are getting a -0.33
in cross-validation. Note that this is normal. From r2_score
docs:
Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.
So this leads us to the second part: why are you getting so different results using CV and train/test splits?
There are two reasons why you are getting better results with train_test_split
.
Evaluating r2
on the probabilities and not on the classes (you are using predict_proba
instead of predict
makes errors less harmful:
print(r2_score(y_test, model.predict_proba(X_test)[:,1], multioutput='variance_weighted'))
0.19131536389654913
While:
print(r2_score(y_test, model.predict(X_test)))
-0.364200082678793
Taking the mean of the 10
folds cv, without checking the variance, which is high. If you check the variance and the details of the results, you will see that the variance is huge:
scores = cross_val_score(model, X, y, cv=10, scoring='r2')
scores
array([-0.67868339, -0.03918495, 0.04075235, -0.47783251, -0.23152709,
-0.39573071, -0.72413793, -0.66666667, 0. , -0.16666667])
scores.mean(), scores.std() * 2
(-0.3339677563815496, 0.5598543351649792)
Hope it helped!
It is possible for R2 to be negative. The following paragraph is from wikipedia page of "Coefficient of determination"
There are cases where the computational definition of R2 can yield negative values, depending on the definition used. This can arise when the predictions that are being compared to the corresponding outcomes have not been derived from a model-fitting procedure using those data. Even if a model-fitting procedure has been used, R2 may still be negative, for example when linear regression is conducted without including an intercept, or when a non-linear function is used to fit the data. In cases where negative values arise, the mean of the data provides a better fit to the outcomes than do the fitted function values, according to this particular criterion. Since the most general definition of the coefficient of determination is also known as the Nash–Sutcliffe model efficiency coefficient, this last notation is preferred in many fields, because denoting a goodness-of-fit indicator that can vary from -infinity to 1 (i.e., it can yield negative values) with a squared letter is confusing.
It seems that the prediction is worse than a horizontal line.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With