I have a python
script, which works on the following scheme: read a large file (e.g., movie) - compose selected information from it into a number of small temporary files - spawn in subprocesses a C++
application to perform the files processing/calculations (separately for each file) - read the application output. To speed up the script I used multiprocessing. However, it has major drawback: each process has to maintain in RAM the whole copy of the large input file, and therefore I can run only few processes, as I run out of memory. Thus I decided to try multithreading instead (or some combination of multiprocessing and multithreading) due to the fact that threads share the address space. As the python
part most of the time works with file I/O
or waits for the C++
application to complete, I thought that GIL
must not be an issue here. Nevertheless, instead of some gain in performance I observe drastic slowdown, mainly owing to the I/O
part.
I illustrate the problem with the following code (saved as test.py
):
import sys, threading, tempfile, time
nthreads = int(sys.argv[1])
class IOThread (threading.Thread):
def __init__(self, thread_id, obj):
threading.Thread.__init__(self)
self.thread_id = thread_id
self.obj = obj
def run(self):
run_io(self.thread_id, self.obj)
def gen_object(nlines):
obj = []
for i in range(nlines):
obj.append(str(i) + '\n')
return obj
def run_io(thread_id, obj):
ntasks = 100 // nthreads + (1 if thread_id < 100 % nthreads else 0)
for i in range(ntasks):
tmpfile = tempfile.NamedTemporaryFile('w+')
with open(tmpfile.name, 'w') as ofile:
for elem in obj:
ofile.write(elem)
with open(tmpfile.name, 'r') as ifile:
content = ifile.readlines()
tmpfile.close()
obj = gen_object(100000)
starttime = time.time()
threads = []
for thread_id in range(nthreads):
threads.append(IOThread(thread_id, obj))
threads[thread_id].start()
for thread in threads:
thread.join()
runtime = time.time() - starttime
print('Runtime: {:.2f} s'.format(runtime))
When I run it with different number of threads, I get this:
$ python3 test.py 1
Runtime: 2.84 s
$ python3 test.py 1
Runtime: 2.77 s
$ python3 test.py 1
Runtime: 3.34 s
$ python3 test.py 2
Runtime: 6.54 s
$ python3 test.py 2
Runtime: 6.76 s
$ python3 test.py 2
Runtime: 6.33 s
Can someone explain me the result, as well as give some advice, how to effectively parallelize I/O
using multithreading?
EDIT:
The slowdown is not due to HDD performance, because:
1) the files are getting cached to RAM anyway
2) the same operations with multiprocessing (not multithreading) are indeed getting faster (almost by factor of CPUs number)
As I delved deeper into the problem, I made comparison benchmarks for 4 different parallelisation methods, 3 of which are using python
and 1 is using java
(the purpose of the test was not to compare I/O machinery between different languages but to see if multithreading can boost I/O operations). The test was performed on Ubuntu 14.04.3, all files were placed to a RAM disk.
Although the data are quite noisy, the clear trend is evident (see the chart; n=5 for each bar, error bars represent SD): python multithreading fails to boost the I/O performance. The most probable reason is GIL, and therefore there is no way around it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With