Memory allocation error in sklearn random forest classification python

Question

I am trying to run sklearn random forest classification on 2,79,900 instances having 5 attributes and 1 class. But i am getting memory allocation error while trying to run the classification at the fit line, it is not able to train the classifier itself. Any suggestions on how to resolve this issue?

The data a is

x,y, day, week, Accuracy

x and y are the coordinates day is which day of the month (1-30) the week is which day of the week (1-7) and accuracy is an integer

code:

import csv
import numpy as np
from sklearn.ensemble import RandomForestClassifier


with open("time_data.csv", "rb") as infile:
    re1 = csv.reader(infile)
    result=[]
    ##next(reader, None)
    ##for row in reader:
    for row in re1:
        result.append(row[8])

    trainclass = result[:251900]
    testclass = result[251901:279953]


with open("time_data.csv", "rb") as infile:
    re = csv.reader(infile)
    coords = [(float(d[1]), float(d[2]), float(d[3]), float(d[4]), float(d[5])) for d in re if len(d) > 0]
    train = coords[:251900]
    test = coords[251901:279953]

print "Done splitting data into test and train data"

clf = RandomForestClassifier(n_estimators=500,max_features="log2", min_samples_split=3, min_samples_leaf=2)
clf.fit(train,trainclass)

print "Done training"
score = clf.score(test,testclass)
print "Done Testing"
print score

Error:

line 366, in fit
    builder.build(self.tree_, X, y, sample_weight, X_idx_sorted)
  File "sklearn/tree/_tree.pyx", line 145, in sklearn.tree._tree.DepthFirstTreeBuilder.build
  File "sklearn/tree/_tree.pyx", line 244, in sklearn.tree._tree.DepthFirstTreeBuilder.build
  File "sklearn/tree/_tree.pyx", line 735, in sklearn.tree._tree.Tree._add_node
  File "sklearn/tree/_tree.pyx", line 707, in sklearn.tree._tree.Tree._resize_c
  File "sklearn/tree/_utils.pyx", line 39, in sklearn.tree._utils.safe_realloc
MemoryError: could not allocate 10206838784 bytes

jubueche · Accepted Answer

From the scikit-learn doc.: "The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values."

I would try to adjust these parameters then. Also, you can try a mem. profiler or try to run it on GoogleCollaborator if your machine has too few RAM.

Ankit · Answer

Please try Google Colaboratory. You can connect with the localhost or hosted runtime. It worked for me for n_estimators=10000.

Memory allocation error in sklearn random forest classification python

Tags:

python

scikit-learn

random-forest

Labeo

2 Answers

jubueche

Ankit

Recent Activity

Donate For Us

Memory allocation error in sklearn random forest classification python

Tags:

python

scikit-learn

random-forest

Labeo

2 Answers

jubueche

Ankit

Related questions

Recent Activity

Donate For Us