I am attempting to use multiprocessing.Pool to do parallel processing on a list of dictionaries. An example is below
(Please note: this is a toy example, my actual example will be doing cpu-intensive processing on the values in the actual dictionary)
import multiprocessing
my_list = [{'letter': 'a'}, {'letter': 'b'}, {'letter': 'c'}]
def process_list(list_elements):
ret_list = []
for my_dict in list_elements:
ret_list.append(my_dict['letter'])
return ret_list
if __name__ == "__main__":
pool = multiprocessing.Pool()
letters = pool.map(process_list, my_list)
print letters
If I run the code above, I get the following error:
Traceback (most recent call last):
File "multiprocess_fail.py", line 13, in <module>
letters = pool.map(process_list, my_list)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 250, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 554, in get
raise self._value
TypeError: string indices must be integers, not str
I don't know what string indices it is referring to. Shouldn't pool.map
just be iterating over the items in my_list
(i.e. the dictionaries)? Do I have to alter how the data is being passed to the map function to get it to run?
pool.map()
takes a callable and an iterable, then proceeds to apply the callable to each element in iterable
. It'll divide the work across the pool workers in chunks, but the function will only ever be passed one element at a time.
You passed in a list of dictionaries, which means that each process_list()
is passed one dictionary:
process_list({'letter': 'a'})
process_list({'letter': 'b'})
# etc.
Your code however is treating the list_elements
as a list. The for
loop:
for my_dict in list_elements:
instead sees dictionary keys, each my_dict
is bound to a key at a time. For your dictionaries, that means there is one iteration, and my_dict
is set to 'letter'
each time. The line:
my_dict['letter']
then tries to index into that string, and 'letter'['letter']
throws the exception you saw.
The following works:
def process_list(list_element):
return list_element['letter']
You'd return one result; map()
gathers all results into a new list and returns that when all workers are done.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With