How to limit number of CPU's used by a python script w/o terminal or multiprocessing library?

Tags:

My main problem is issued here. Since no one has given a solution yet, I have decided to find a workaround. I am looking for a way to limit a python scripts CPU usage (not priority but the number of CPU cores) with python code. I know I can do that with multiprocessing library (pool, etc.) but I am not the one who is running it with multiprocessing. So, I don't know how to that. And also I could do that via terminal but this script is being imported by another script. Unfortunately, I don't have the luxury of calling it through terminal.

tl;dr: How to limit CPU usage (number of cores) of a python script, which is being imported by another script and I don't even know why it runs in parallel, without running it via terminal. Please check the code snippet below.

The code snippet causing the issue:

from sklearn.datasets import load_digits
from sklearn.decomposition import IncrementalPCA
import numpy as np

X, _ = load_digits(return_X_y=True)

#Copy-paste and increase the size of the dataset to see the behavior at htop.
for _ in range(8):
    X = np.vstack((X, X))

print(X.shape)

transformer = IncrementalPCA(n_components=7, batch_size=200)

#PARTIAL FIT RUNS IN PARALLEL! GOD WHY?
---------------------------------------
transformer.partial_fit(X[:100, :])
---------------------------------------
X_transformed = transformer.fit_transform(X)

print(X_transformed.shape)

Versions:

Python 3.6
joblib 0.13.2
scikit-learn 0.20.2
numpy 1.16.2

UPDATE: Doesn't work. Thank you for clarification @Darkonaut . The sad thing is, I already knew this wouldn't work and I already clearly stated on the question title but people don't read I guess. I guess I am doing it wrong. I've updated the code snippet based on the @Ben Chaliah Ayoub answer. Nothing seems to be changed. And also I want to point out to something: I am not trying to run this code on multiple cores. This line transformer.partial_fit(X[:100, :]) running on multiple cores (for some reason) and it doesn't have n_jobs or anything. Also please note that my first example and my original code is not initialized with a pool or something similar. I can't set the number of cores in the first place (Because there is no such place). But now there is a place for it but it is still running on multiple cores. Feel free to test it yourself. (Code below) That's why I am looking for a workaround.

from sklearn.datasets import load_digits
from sklearn.decomposition import IncrementalPCA
import numpy as np
from multiprocessing import Pool, cpu_count
def run_this():
    X, _ = load_digits(return_X_y=True)
    #Copy-paste and increase the size of the dataset to see the behavior at htop.
    for _ in range(8):
        X = np.vstack((X, X))
    print(X.shape)
    #This is the exact same example taken from sckitlearn's IncrementalPCA website.
    transformer = IncrementalPCA(n_components=7, batch_size=200)
    transformer.partial_fit(X[:100, :])
    X_transformed = transformer.fit_transform(X)
    print(X_transformed.shape)
pool= Pool(processes=1)
pool.apply(run_this)

UPDATE: So, I have tried to set blas threads using this in my code before importing numpy but it didn't work (again). Any other suggestions? The latest stage of code can be found below.

Credits: @Amir

from sklearn.datasets import load_digits
from sklearn.decomposition import IncrementalPCA
import os
os.environ["OMP_NUM_THREADS"] = "1" # export OMP_NUM_THREADS=1
os.environ["OPENBLAS_NUM_THREADS"] = "1" # export OPENBLAS_NUM_THREADS=1
os.environ["MKL_NUM_THREADS"] = "1" # export MKL_NUM_THREADS=1
os.environ["VECLIB_MAXIMUM_THREADS"] = "1" # export VECLIB_MAXIMUM_THREADS=1
os.environ["NUMEXPR_NUM_THREADS"] = "1" # export NUMEXPR_NUM_THREADS=1

import numpy as np

X, _ = load_digits(return_X_y=True)

#Copy-paste and increase the size of the dataset to see the behavior at htop.
for _ in range(8):
    X = np.vstack((X, X))

print(X.shape)
transformer = IncrementalPCA(n_components=7, batch_size=200)

transformer.partial_fit(X[:100, :])

X_transformed = transformer.fit_transform(X)

print(X_transformed.shape)

623

asked Apr 18 '19 13:04

MehmedB

1 Answers

I am looking for a way to limit a python scripts CPU usage (not priority but the number of CPU cores) with python code.

Run you application with taskset or numactl.

For example, to make your application utilize only the first 4 CPUs do:

taskset --cpu-list 0-3 <app>

These tools, however, limit the process to use specific CPUs, not the total number of used CPUs. For best results they require those CPUs to be isolated from the OS process scheduler, so that the scheduler doesn't run any other processes on those CPUs. Otherwise, if the specified CPUs are currently running other threads, while other CPUs are idle, your threads won't be able to run on other idle CPUs and will have to queue up for these specific CPUs, which isn't ideal.

Using cgroups you can limit your processes/threads to use a specific fraction of available CPU resources without limiting to specific CPUs, but cgroups setup is less trivial.

103

answered Oct 22 '22 00:10

Maxim Egorushkin

Related questions
                            
                                How to convert RGB images to grayscale in PyTorch dataloader?
                            
                                How to get column name for second largest row value in pandas DataFrame
                            
                                fastest way to share data between a C++ and Python program? [closed]
                            
                                Why pytorch DataLoader behaves differently on numpy array and list?
                            
                                count consecutive days python dataframe
                            
                                Way for Pathlib Path.rename() to create intermediate directories?
                            
                                Modifying a pytorch tensor and then getting the gradient lets the gradient not work
                            
                                Pandas groupby + transform and multiple columns
                            
                                Python monkeypatch.setattr() with pytest fixture at module scope
                            
                                create wordcloud in python for foreign language (Hebrew)
                            
                                Why does my code take different values when i switch the order in a set (knowing that order doesn't matter with sets)
                            
                                "ES6-like" Python dict spread [duplicate]
                            
                                Activate venv (Python 3.7.2) for Windows [duplicate]
                            
                                Pip hangs on "collecting numpy"
                            
                                numpy AttributeError: with theano module 'numpy.core.multiarray' has no attribute _get_ndarray_c_version
                            
                                What does this mean in Python '\x1b[2K'?
                            
                                Python: passing argument to generator object created by generator expression?
                            
                                Combining iloc and loc
                            
                                Django Rest Framework - Using Session and Token Auth
                            
                                Work on multiple branches with Flask-Migrate

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to limit number of CPU's used by a python script w/o terminal or multiprocessing library?

Tags:

python

unix

python-3.x

multiprocessing

python-multiprocessing

MehmedB

People also ask

1 Answers

Maxim Egorushkin

Recent Activity

Donate For Us