Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing.Pool map() "TypeError: string indices must be integers, not str"

I am attempting to use multiprocessing.Pool to do parallel processing on a list of dictionaries. An example is below

(Please note: this is a toy example, my actual example will be doing cpu-intensive processing on the values in the actual dictionary)

import multiprocessing

my_list = [{'letter': 'a'}, {'letter': 'b'}, {'letter': 'c'}]

def process_list(list_elements):
    ret_list = []
    for my_dict in list_elements:
        ret_list.append(my_dict['letter'])
    return ret_list

if __name__ == "__main__":
    pool = multiprocessing.Pool()
    letters = pool.map(process_list, my_list)
    print letters

If I run the code above, I get the following error:

Traceback (most recent call last):
  File "multiprocess_fail.py", line 13, in <module>
    letters = pool.map(process_list, my_list)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 250, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 554, in get
    raise self._value
TypeError: string indices must be integers, not str

I don't know what string indices it is referring to. Shouldn't pool.map just be iterating over the items in my_list (i.e. the dictionaries)? Do I have to alter how the data is being passed to the map function to get it to run?

like image 618
Vector Avatar asked Mar 14 '14 17:03

Vector


1 Answers

pool.map() takes a callable and an iterable, then proceeds to apply the callable to each element in iterable. It'll divide the work across the pool workers in chunks, but the function will only ever be passed one element at a time.

You passed in a list of dictionaries, which means that each process_list() is passed one dictionary:

process_list({'letter': 'a'})
process_list({'letter': 'b'})
# etc.

Your code however is treating the list_elements as a list. The for loop:

for my_dict in list_elements:

instead sees dictionary keys, each my_dict is bound to a key at a time. For your dictionaries, that means there is one iteration, and my_dict is set to 'letter' each time. The line:

my_dict['letter']

then tries to index into that string, and 'letter'['letter'] throws the exception you saw.

The following works:

def process_list(list_element):
    return list_element['letter']

You'd return one result; map() gathers all results into a new list and returns that when all workers are done.

like image 189
Martijn Pieters Avatar answered Nov 14 '22 22:11

Martijn Pieters