Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing pool.map raises IndexError

I've developed a utility using python/cython that sorts CSV files and generates stats for a client, but invoking pool.map seems to raise an exception before my mapped function has a chance to execute. Sorting a small number of files seems to function as expected, but as the number of files grows to say 10, I get the below IndexError after calling pool.map. Does anyone happen to recognize the below error? Any help is greatly appreciated.

While the code is under NDA, the use-case is fairly simple:

Code Sample:

def sort_files(csv_files):
    pool_size = multiprocessing.cpu_count()
    pool = multiprocessing.Pool(processes=pool_size)
    sorted_dicts = pool.map(sort_file, csv_files, 1)
    return sorted_dicts

def sort_file(csv_file):
    print 'sorting %s...' % csv_file
    # sort code

Output:

File "generic.pyx", line 17, in generic.sort_files (/users/cyounker/.pyxbld/temp.linux-x86_64-2.7/pyrex/generic.c:1723)
    sorted_dicts = pool.map(sort_file, csv_files, 1)
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 227, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 528, in get
    raise self._value
IndexError: list index out of range
like image 451
Cryo Avatar asked Dec 21 '12 19:12

Cryo


2 Answers

The IndexError is an error you get somewhere in sort_file(), i.e. in a subprocess. It is re-raised by the parent process. Apparently multiprocessing doesn't make any attempt to inform us about where the error really comes from (e.g. on which lines it occurred) or even just what argument to sort_file() caused it. I hate multiprocessing even more :-(

like image 122
Armin Rigo Avatar answered Oct 19 '22 22:10

Armin Rigo


Check further up in the command output. In Python 3.4 at least, multiprocessing.pool will helpfully print a RemoteTraceback above the parent process traceback. You'll see something like:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.4/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.4/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/path/to/your/code/here.py", line 80, in sort_file
    something = row[index]
IndexError: list index out of range
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "generic.pyx", line 17, in generic.sort_files (/users/cyounker/.pyxbld/temp.linux-x86_64-2.7/pyrex/generic.c:1723)
    sorted_dicts = pool.map(sort_file, csv_files, 1)
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 227, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 528, in get
    raise self._value
IndexError: list index out of range

In the case above, the code raising the error is at /path/to/your/code/here.py", line 80

see also debugging errors in python multiprocessing

like image 45
waterproof Avatar answered Oct 19 '22 23:10

waterproof