When Scikit linear models return negative value for score?

Q: Why is a linear regression score negative?

Interpreting Linear Regression Coefficients A positive coefficient indicates that as the value of the independent variable increases, the mean of the dependent variable also tends to increase. A negative coefficient suggests that as the independent variable increases, the dependent variable tends to decrease.

Q: Can the linear regression score negative?

The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y , disregarding the input features, would get a score of 0.0.

Q: What does a negative prediction score mean?

Negative predictive value is the proportion of the cases giving negative test results who are already healthy (3). It is the ratio of subjects truly diagnosed as negative to all those who had negative test results (including patients who were incorrectly diagnosed as healthy).

Q: What does a negative R2 score mean?

In practice, R2 will be negative whenever your model's predictions are worse than a constant function that always predicts the mean of the data.

Tags:

python

scikit-learn

linear-regression

I'm new in machine learning, and trying to implement linear model estimators that provide Scikit to predict price of the used car. I used different combinations of linear models, like LinearRegression, Ridge, Lasso and Elastic Net, but all of them in most cases return negative score (-0.6 <= score <= 0.1).

Someone told me that this is because of multicollinearity problem, but I don't know how to solve it.

My sample code:

import numpy as np
import pandas as pd
from sklearn import linear_model
from sqlalchemy import create_engine
from sklearn.linear_model import Ridge

engine = create_engine('sqlite:///path-to-db')

query = "SELECT mileage, carcass, engine, transmission, state, drive, customs_cleared, price FROM cars WHERE mark='some mark' AND model='some model' AND year='some year'"
df = pd.read_sql_query(query, engine)
df = df.dropna()
df = df.reindex(np.random.permutation(df.index))

X_full = df[['mileage', 'carcass', 'engine', 'transmission', 'state', 'drive', 'customs_cleared']]
y_full = df['price']

n_train = -len(X_full)/5
X_train = X_full[:n_train]
X_test = X_full[n_train:]
y_train = y_full[:n_train]
y_test = y_full[n_train:]

predict = [200000, 0, 2.5, 0, 0, 2, 0] # parameters of the car to predict

model = Ridge(alpha=1.0)
model.fit(X_train, y_train)
y_estimate = model.predict(X_test)

print("Residual sum of squares: %.2f" % np.mean((y_estimate - y_test) ** 2))
print("Variance score: %.2f" % model.score(X_test, y_test))
print("Predicted price: ", model.predict(predict))

Carcass, state, drive and customs cleared are numeric and represent types.

What is correct way to implement prediction? Maybe some data preprocessing or different algorithm.

Thanks for any advance!

456

asked Jun 07 '15 10:06

Shyngys Kassymov

1 Answers

Given that you are using Ridge Regression, you should scale your variables using StandardScaler, or MinMaxScaler:

http://scikit-learn.org/stable/modules/preprocessing.html#standardization-or-mean-removal-and-variance-scaling

Perhaps using a Pipeline:

http://scikit-learn.org/stable/modules/pipeline.html#pipeline-chaining-estimators

If you were using vanilla Regression, scaling wouldn't matter; but with Ridge Regression, the regularization penalty term (alpha) will treat differently scaled variables differently. See this discussion on stats:

https://stats.stackexchange.com/questions/29781/when-should-you-center-your-data-when-should-you-standardize

141

answered Oct 12 '22 12:10

Andreus

Related questions
                            
                                How do I change the serializer that my multiprocessing.mangers.BaseManager subclass uses to cPickle?
                            
                                GenericRelatedObjectManager not JSON serializable
                            
                                When to use train_test_split of scikit learn
                            
                                Only ignore stop words for ngram_range=1
                            
                                Is Python's file.write atomic?
                            
                                Flask debug mode when using sockets
                            
                                Only one process prints in unix, multiprocessing python
                            
                                install ipython for current python version 2.x
                            
                                Connect to FTP TLS 1.2 Server with ftplib
                            
                                Python: Sum the Values of Three Layer Dictionaries
                            
                                Linear Regression from Time Series Pandas
                            
                                What is the usage of third argument objtype in Python descriptor's __get__ [duplicate]
                            
                                How to pass python function as an argument to c++ function using Cython
                            
                                Pythonic way to process multiple for loops with different filters against the same list?
                            
                                Why can't I set SOAP headers in pysimplesoap?
                            
                                How to apply different aggregation functions to same column by using pandas Groupby
                            
                                How do you change signatures using python gmail api?
                            
                                Nose does not run tests
                            
                                Python - pythoncom.PumpMessages()
                            
                                Replace text without escaping in BeautifulSoup

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With