Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check the accuracy of decision tree classifier with Python

I wrote a function that takes dataset (excel / pandas) and some values, and then predicts outcome with decision tree classifier. I have done that with sklearn. Can you help me with this, I have looked over the web and this website but I couldnt find the answer that works. I have tried to do this, but it does not work:

from sklearn.metrics import accuracy_score
score = accuracy_score(variable_list, result_list)

This is the error that I get:

ValueError: Classification metrics can't handle a mix of continuous-multioutput and multiclass targets

This is the code(I removed code for accuracy)

import pandas as pd
import math
import xlrd
from sklearn.model_selection import train_test_split
from sklearn import tree

def predict_concrete_class(input_data, cement, blast_fur_slug,fly_ash,
                            water, superpl, coarse_aggr, fine_aggr, days):

    data_for_tree = concrete_strenght_class(input_data)

    variable_list = []
    result_list = []

    for index, row in data_for_tree.iterrows():
        variable = row.tolist()
        variable = variable[0:8]

        variable_list.append(variable)

        result_list.append(row[-1])

    decision_tree = tree.DecisionTreeClassifier()
    decision_tree = decision_tree.fit(variable_list,result_list)

    input_values = [cement, blast_fur_slug, fly_ash, water, superpl, coarse_aggr, fine_aggr, days]

    prediction = decision_tree.predict([input_values])

    info = "Prediction of future concrete class after "+ str(days)+" days: "+ str(prediction[0])

    return info

print(predict_concrete_class(data, 500, 0, 0, 200, 0, 1125, 613, 3))
like image 829
taga Avatar asked Mar 04 '23 10:03

taga


2 Answers

  1. Split your data into train and test:

    var_train, var_test, res_train, res_test = train_test_split(variable_list, result_list, test_size = 0.3)
    
  2. Train your decision tree on train set:

    decision_tree = tree.DecisionTreeClassifier()
    decision_tree = decision_tree.fit(var_train, res_train)
    
  3. Test model performance by calculating accuracy on test set:

    res_pred = decision_tree.predict(var_test)
    score = accuracy_score(res_test, res_pred)
    

    Or you could directly use decision_tree.score:

    score = decision_tree.score(var_test, res_test)
    

The error you are getting is because you are trying to pass variable_list (which is your list of input features) as a parameter in accuracy_score. You are supposed to pass your list of true labels and predicted labels.

like image 135
panktijk Avatar answered Apr 26 '23 23:04

panktijk


You should perform a cross validation if you want to check the accuracy of your system.

You have to split you data set into two parts. The first one is used to learn your system. Then you perform the prediction process on the second part of the data set and compared the predicted results with the good ones. With this method, you check your system on a unlearned data set.

In order to split your set, you should use train_test_split from sklearn.model_selection You will split your set randomly.

Here is good lecture: https://machinelearningmastery.com/k-fold-cross-validation/

like image 36
Ghislain Moreau Avatar answered Apr 27 '23 00:04

Ghislain Moreau