Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Negative accuracy score in regression models with Scikit-Learn

I wrote a code that predicts house prices. The problem is, Im getting negative accuracy score. I have used 5 different algorithms and accuracy score is all over the place.

The first problem that I have is that I get a warning when I'm using .map function, but I do not think thats a problem here.

The regression models work , but their train and test accuracy are all over the place. I have also tried this:

from sklearn.metrics import accuracy_score ... score_train = regression.accuracy_score(variables_train, result_train) ...

but It showed me this AttributeError: 'LinearRegression' object has no attribute 'accuracy_score'

You can download the database from here:

https://www.sendspace.com/file/93nkdy

This is the code:

import pandas as pd
from sklearn import linear_model
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor

from sklearn.model_selection import train_test_split

#pandas display options
pd.set_option('display.max_rows', 70)
pd.set_option('display.max_columns', 100)
pd.set_option('display.width', 1000)

data = pd.read_csv("validate.csv")
data = data.drop(columns = ["id"])

data = data.dropna(axis='columns')

data_for_pred = data[["bedrooms_total", "baths_total",
                        "sq_ft_tot_fn", "garage_capacity",
                        "city", "total_stories", "rooms_total",
                        "garage", "flood_zone","price_closed"]]

#to see how many different values I have 
cities =  data_for_pred['city'].unique()
garage = data_for_pred['garage'].unique()
flood_zone = data_for_pred['flood_zone'].unique()

#mapping so that I can do my regression
data_for_pred['city'] = data_for_pred['city'].map({'Woodstock': 1, 'Barnard': 2, 'Pomfret': 3})
data_for_pred['garage'] = data_for_pred['garage'].map({'No': 0, 'Yes': 1})
data_for_pred['flood_zone'] = data_for_pred['flood_zone'].map({'Unknown': 0, 'Yes': 1, 'No': -1})

#print(data_for_pred)

def regression_model(bedrooms_num, baths_num, sq_ft_tot, garage_cap,
                    city, total_stor, rooms_tot, garage, flood_zone):


    classifiers = [
        ["Linear regression", linear_model.LinearRegression()],
        ["Support vector regression", SVR(gamma = 'auto')],
        ["Decision tree regression", DecisionTreeRegressor()],
        ["SVR - RBF", SVR(kernel = "rbf", C = 1e3, gamma = 0.1)],
        ["SVR - Linear regression", SVR(kernel = "linear", C = 1e0)]]

    variables = data_for_pred.iloc[:,:-1]
    results = data_for_pred.iloc[:,-1]

    predictionData = [bedrooms_num, baths_num, sq_ft_tot, garage_cap, city,
                      total_stor, rooms_tot, garage, flood_zone]

    info = ""

    for item in classifiers:

        regression = item[1]

        variables_train, variables_test, result_train, result_test = train_test_split(variables, results , test_size = 0.2, random_state = 4)

        regression.fit(variables_train, result_train)

        #Prediction
        prediction = regression.predict([predictionData])
        prediction = round(prediction[0], 2)

        #Accuracy of prediction
        score_train = regression.score(variables_train, result_train)
        score_train = round(score_train*100, 2)

        score_test = regression.score(variables_test, result_test)
        score_test = round(score_test*100, 2)

        info += str(item[0]) + " prediction: " + str(prediction) + " | Train accuracy: " + str(score_train) + "% | Test accuracy: " + str(score_test) + "%\n"

    return info


print(regression_model(7, 8, 4506, 0, 1, 2.00, 15, 0, 0)) #true value 375000
print(regression_model(8, 8, 5506, 0, 1, 2.00, 15, 0, 0)) #true value more then 375000
like image 674
taga Avatar asked Jun 16 '19 17:06

taga


People also ask

Is it possible to have negative accuracy?

The accuracy is either a measure of the width of a Gaussian model of error, or it is a confidence interval width. It is always positive, never negative, and there are no circumstances in which a negative accuracy makes sense.

What does negative score mean in linear regression?

Interpreting Linear Regression Coefficients A positive coefficient indicates that as the value of the independent variable increases, the mean of the dependent variable also tends to increase. A negative coefficient suggests that as the independent variable increases, the dependent variable tends to decrease.

What does negative accuracy mean for machine learning?

Accuracy in Machine Learning A true positive or true negative is a data point that the algorithm correctly classified as true or false, respectively. A false positive or false negative, on the other hand, is a data point that the algorithm incorrectly classified.

Can the linear regression score negative?

The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y , disregarding the input features, would get a score of 0.0.


1 Answers

The accuracy is defined for classification problems. Here you have a regression problem.

The .score method of the LinearRegression returns the coefficient of determination R^2 of the prediction not the accuracy.

score(self, X, y[, sample_weight]) Returns the coefficient of determination R^2 of the prediction.

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html

EDIT

You can use this IF YOU PREDICT LABELS (CLASSIFICATION problem).

from sklearn.metrics import accuracy_score  
scores_classification = accuracy_score(result_train, prediction)

IF YOU PREDICT SCALAR VALUES (REGRESSION problem)- this is your case you should use regression metrics like:

scores_regr = metrics.mean_squared_error(y_true, y_pred)

All regression scoring methods are here: https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics

EDIT 2

Use:

score_train = mean_squared_error(result_train, prediction)
like image 139
seralouk Avatar answered Nov 08 '22 00:11

seralouk