Python ThreadPool from multiprocessing.pool cannot ultilize all CPUs

Question

I have some string processing job in Python. And I wish to speed up the job by using a thread pool. The string processing job has no dependency to each other. The result will be stored into a mongodb database.

I wrote my code as follow:

thread_pool_size = multiprocessing.cpu_count()
pool = ThreadPool(thread_pool_size)
for single_string in string_list:
    pool.apply_async(_process, [single_string ])
pool.close()
pool.join()

def _process(s):
    # Do staff, pure python string manipulation.
    # Save the output to a database (pyMongo).

I try to run the code in a Linux machine with 8 CPU cores. And it turns out that the maximum CPU usage can only be around 130% (read from top), when I run the job for a few minutes.

Is my approach correct to use a thread pool? Is there any better way to do so?

101 · Accepted Answer

Perhaps _process isn't CPU bound; it might be slowed by the file system or network if you're writing to a database. You could see if the CPU usage rises if you make your process truly CPU bound, for example:

def _process(s):
    for i in xrange(100000000):
        j = i * i

RaJa · Answer

You might check using multiple processes instead of multiple threads. Here is a good comparison of both options. In one of the comments it is stated that Python is not able to use multiple CPUs while working with multiple threads (due to the Global interpreter lock). So instead of using a Thread pool you should use a Process pool to take full leverage of your machine.

Python ThreadPool from multiprocessing.pool cannot ultilize all CPUs

Tags:

python

multithreading

Ivor Zhou

2 Answers

101

RaJa

Recent Activity

Donate For Us

Python ThreadPool from multiprocessing.pool cannot ultilize all CPUs

Tags:

python

multithreading

Ivor Zhou

2 Answers

101

RaJa

Related questions

Recent Activity

Donate For Us