Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Training a RandomForest is slow on a computing cluster

I have an account to a computing cluster than runs on Linux. I'm using scikit-learn to train a Random Forest classifier with 1000 trees on a very large dataset. I tried to use all the cores of the computing cluster by running the following code:

clf = RandomForestClassifier(n_estimators=1000, n_jobs=-1)
clf.fit(data, Y)

However when I run the code I see that only 1.2% of the CPUs are being used! So why it's not using all the cores that exists? And how to solve this please?

Edit: I saw that my problem could be relevant to the one in this link, but I wasn't able to understand the solution. https://github.com/scikit-learn/scikit-learn/issues/1053

like image 368
Jack Twain Avatar asked Oct 17 '25 16:10

Jack Twain


1 Answers

This might not be the root of your problem (as n_jobs=-1 should automatically detect and use all cores in your master node) but Sklearn will run in parallel in all cores of a single machine in your cluster. By default it will NOT run on cores from different machines in your cluster as this would imply knowing about the architecture of your cluster and communicating via the network which sklearn doesn't know how to do, as it varies from cluster to cluster.

For this you will have to use a solution like ipython parallel. See the excellent tutorial by Oliver Grisel if you want to use the full power of your cluster.

I recommend that you update sklearn to the latest version, try your code locally(ideally under the same OS, sklearn version), debug the scaling behavior and CPU utilization by setting n_jobs=1,2,3... and benchmarking the fit. For example if n_jobs=1 doesn't have a high utilization rate in one core in the cluster but it does in your local PC, this would indicate a problem with the cluster and not with the code. Sometimes the top command in a cluster behaves differently, you should consult this with the admin.

like image 97
elyase Avatar answered Oct 20 '25 07:10

elyase



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!