Python Scikit Random Forest Regressor Error

Tags:

I am trying to load training and test data from a csv, run the random forest regressor in scikit/sklearn, and then predict the output from the test file.

The TrainLoanData.csv file contains 5 columns; the first column is the output and the next 4 columns are the features. The TestLoanData.csv contains 4 columns - the features.

When I run the code, I get error:

    predicted_probs = ["%f" % x[1] for x in predicted_probs]
IndexError: invalid index to scalar variable.

What does this mean?

Here is my code:

import numpy, scipy, sklearn, csv_io //csv_io from https://raw.github.com/benhamner/BioResponse/master/Benchmarks/csv_io.py
from sklearn import datasets
from sklearn.ensemble import RandomForestRegressor

def main():
    #read in the training file
    train = csv_io.read_data("TrainLoanData.csv")
    #set the training responses
    target = [x[0] for x in train]
    #set the training features
    train = [x[1:] for x in train]
    #read in the test file
    realtest = csv_io.read_data("TestLoanData.csv")

    # random forest code
    rf = RandomForestRegressor(n_estimators=10, min_samples_split=2, n_jobs=-1)
    # fit the training data
    print('fitting the model')
    rf.fit(train, target)
    # run model against test data
    predicted_probs = rf.predict(realtest)
    print predicted_probs
    predicted_probs = ["%f" % x[1] for x in predicted_probs]
    csv_io.write_delimited_file("random_forest_solution.csv", predicted_probs)

main()

789

asked Jan 09 '13 09:01

2 Answers

The return value from a RandomForestRegressor is an array of floats:

In [3]: rf = RandomForestRegressor(n_estimators=10, min_samples_split=2, n_jobs=-1)

In [4]: rf.fit([[1,2,3],[4,5,6]],[-1,1])
Out[4]: 
RandomForestRegressor(bootstrap=True, compute_importances=False,
           criterion='mse', max_depth=None, max_features='auto',
           min_density=0.1, min_samples_leaf=1, min_samples_split=2,
           n_estimators=10, n_jobs=-1, oob_score=False,
           random_state=<mtrand.RandomState object at 0x7fd894d59528>,
           verbose=0)

In [5]: rf.predict([1,2,3])
Out[5]: array([-0.6])

In [6]: rf.predict([[1,2,3],[4,5,6]])
Out[6]: array([-0.6,  0.4])

So you're trying to index a float like (-0.6)[1], which is not possible.

As a side note, the model does not return probabilities.

114

answered Sep 30 '22 17:09

Fred Foo

First, it's always helpful to also have the sample data to reproduce and debug your problem. If they are too big or secret, you could extract the interesting part of them.

The contents of the variable predicted_probs seems not to be as you expect. It seems to be a list (or array) of integers, and this is also what I'd expect.

In sklearn, the X.train() method always take the trainingdata and their corresonding classes (usually integers or strings). The X.predict() method then only takes validation data and returns the prediction results, i.e., for each set in the validation data one class (again integer or string).

If you want to know how good the accuracy of the trained classifier is, you must not just train and predict, but you must do a cross validation, i.e., repeatedly train and validate and each time check how many predictions were correct. sklean has an excellent documentation, I'm sure you will find the respective section. If not, ask me.

answered Sep 30 '22 19:09

Thorsten Kranz

Related questions
                            
                                scipy optimize.curve_fit cannot fit a function whose return value depends on a conditional
                            
                                Running Python script from command line opens script in the default text editor instead of executing script
                            
                                Python ctypes and DLL that uses a COM object
                            
                                How to elegantly compare zip folder contents to unzipped folder contents
                            
                                How to extend model on serializer level with django-rest-framework
                            
                                Twisted reactor is stopped, but program doesn't end?
                            
                                python decorator for field
                            
                                Can't reference css stylesheet in template being loaded by python
                            
                                Algorithm to find the "percolation" threshold in a weighed network
                            
                                When I run the full test suite in Django, I get errors about missing MessageMiddleware
                            
                                Python - Py_Initialize unresolved during compilation
                            
                                What is the most Pythonic way to use an empty dictionary as a default argument?
                            
                                Efficient way to calculate grid quadrants a line passes through
                            
                                While-loop with if-statement faster than while-loop
                            
                                How can I create a model of a table with an enum in peewee 2?
                            
                                Django Select Option selected issue
                            
                                How can I tell Scrapy to only crawl links inside an Xpath?
                            
                                Python Anywhere access denied to MySQL database
                            
                                ImportError: Cannot open shared object file in Python
                            
                                Which algorithm would fit best to solve a word-search game like "Boggle" with Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Scikit Random Forest Regressor Error

Tags:

python

machine-learning

scipy

scikit-learn

random-forest

user1137778

People also ask

2 Answers

Fred Foo

Thorsten Kranz

Recent Activity

Donate For Us