I've got a following issue. I'm trying to refactor my code in order to process API calls using multithreading. My core data is simple list of tuples in following format:
lst = [('/Users/sth/photo1.jpg',
'/Users/sth/photo2'),
('/Users/sth/photo1.jpg',
'/Users/sth/photo3'), (...)]
Function that I use takes lst list and process it through an API which requires a pair of photos. After all a single number is returned for each pair. So far, I'm using a loop to put a tuple into my function and produce mentioned number. I would like to paralellize the whole computation in a way that one process takes a part of my list and calls the function for the tuples inside a batch. To do that I was trying to use pool function for multiprocessing module:
from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(2)
results = pool.map(score_function, lst)
However, following error occurs:
IOError: [Errno 2] No such file or directory: 'U'
Something strange is happening here. It tries to treat a single character from my tuple as an argument. Any ideas how to do it properly?
Thank You
@edit
The lack of score_function definition is my bad. Let me update the question:
def score_function(pairs):
score_list = list()
for pair in pairs:
score = findElement(target = pair[0], source = pair[1])
score_list.append([pair[0], pair[1], score])
return score_list
Where findElement is defined as:
def findElement(target, source):
with open(source, 'rb') as source_:
source_bytes = source_.read()
with open(target, 'rb') as target_:
target_bytes = target_.read()
score = API_request(target_bytes = target_bytes,
source_bytes = source_bytes)
return score
You can use the starmap function instead of map like this:
from multiprocessing import Pool
pool = Pool(processes=4)
results = pool.starmap(score_function, lst)
pool.close()
pool.join()
Your problem is the for loop. It breaks your tuple to individual strings. Do this and it should work:
def score_function(pairs):
score_list = list()
score = findElement(target = pairs[0], source = pairs[1])
score_list.append([pairs[0], pairs[1], score])
return score_list
You probably assumed your score_function would receive the lst variable as a parameter. This does not happen. lst needs to be a list as it is in your case, and pool.map automatically splits that to individual elements and feeds exactly one element to score_function and keeps doing so until the whole list has been processed by your workers. Each call to a worker receives as a parameter just the one element it is supposed to work on. Your individual elements are tuples (path1, path2), and when you call for on this tuple, you receive just one single path (string) in your loop and pair[1] is just the second character of this string.
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With