MemoryError when fitting scikit-learn Decision Tree and Random Forest Classifiers

Tags:

I have a pandas DataFrame with 86k rows, 5 features and 1 target column. I'm trying to train a DecisionTreeClassifier using 70% of the DataFrame as train data, and I get a MemoryError from the fit method. I've tried changing some of the parameters but I don't really know what's causing the error so I don't know how to handle it. I'm on Windows 10 with 8GB of RAM.

Code

train, test = train_test_split(data, test_size = 0.3)
X_train = train.iloc[:, 1:-1] # first column is not a feature
y_train = train.iloc[:, -1]
X_test = test.iloc[:, 1:-1]
y_test = test.iloc[:, -1]

DT = DecisionTreeClassifier()
DT.fit(X_train, y_train)
dt_predictions = DT.predict(X_test)

Error

File (...), line 97, in <module>
DT.fit(X_train, y_train)
File "(...)\AppData\Local\Programs\Python\Python36-32\lib\site-packages\sklearn\tree\tree.py", line 790, in fit
X_idx_sorted=X_idx_sorted)
File "(...)\AppData\Local\Programs\Python\Python36-32\lib\site-packages\sklearn\tree\tree.py", line 362, in fit
builder.build(self.tree_, X, y, sample_weight, X_idx_sorted)
File "sklearn\trewe\_tree.pyx", line 145, in sklearn.tree._tree.DepthFirstTreeBuilder.build
File "sklearn\tree\_tree.pyx", line 244, in sklearn.tree._tree.DepthFirstTreeBuilder.build
File "sklearn\tree\_tree.pyx", line 735, in sklearn.tree._tree.Tree._add_node
File "sklearn\tree\_tree.pyx", line 707, in sklearn.tree._tree.Tree._resize_c
File "sklearn\tree\_utils.pyx", line 39, in sklearn.tree._utils.safe_realloc
MemoryError: could not allocate 671612928 bytes

Same error happens when I try the RandomForestClassifier, always in the line that does the fitting. How can I solve this?

553

asked Jun 21 '18 18:06

julia

1 Answers

I've been running into the same issue. Be sure you're dealing with a Classification problem and not a Regression problem. If your target column is continuous, you might want to use http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html instead of RandomForestClassifier.

149

answered Nov 02 '22 13:11

Teuszie

Related questions
                            
                                Python numpy: perform function on each pair of columns in a numpy 2-D array?
                            
                                zsh: /usr/local/bin/youtube-dl: bad interpreter: /usr/local/opt/python/bin/python2.7: no such file or directory
                            
                                How to batch delete buckets
                            
                                Using RandomForestClassifier.decision_path, how do I tell which samples the classifier used to make a decision?
                            
                                How to limit tensorflow memory usage?
                            
                                Sqlite database backup and restore in flask sqlalchemy
                            
                                Type hint a subclass of list
                            
                                Implementing Tags using Django rest framework
                            
                                Importing matplotlib.pyplot fails in PyCharm due to AttributeError: module 'PyQt5.QtGui' has no attribute 'QApplication'
                            
                                Return Longest Path with nodes of same value
                            
                                extracting graph from printed ecg
                            
                                Jupyter Notebook Input Line Executed Before Print Statement
                            
                                How to link python 2.7 with latest openssl version in MAC OS?
                            
                                Using Scrapy on a Google cache of a website
                            
                                How to split training and test sets?
                            
                                Keras SimpleRNN confusion
                            
                                pip install pcapy cannot open include file 'pcap.h'
                            
                                No module named pathlib2
                            
                                convert EXR to JPEG using ImageIO and Python
                            
                                Python - Gmail API - Instance of 'Resource' has no 'users' member

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

MemoryError when fitting scikit-learn Decision Tree and Random Forest Classifiers

Tags:

python

machine-learning

scikit-learn

decision-tree

julia

People also ask

1 Answers

Teuszie

Recent Activity

Donate For Us