Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallelizing a list comprehension in Python

someList = [x for x in someList if not isOlderThanXDays(x, XDays, DtToday)]

I have this line and the function isOlderThanXDays makes some API calls causing it to take a while. I would like to perform this using multi/parrellel processing in python. The order in which the list is done doesn't matter (so asynchronous I think)

The function isOlderThanXDays essentially returns a boolean value and everything newer than is kept in the new list using List Comprehension.

Edit: Params of function: So the XDays is for the user to pass in lets say 60 days. and DtToday is today's date (date time object). Then I make API calls to see metaData of the file's modified date and return if it is older I return true otherwise false.

I am looking for something similar to the question below. The difference is this question for every list input there is an output, whereas mine is like filtering the list based on boolean value from the function used, so I don't know how to apply it in my scenario

How to parallelize list-comprehension calculations in Python?

like image 622
Jay M Avatar asked Nov 16 '25 11:11

Jay M


2 Answers

This should run all of your checks in parallel, and then filter out the ones that failed the check.

import multiprocessing

try:
    cpus = multiprocessing.cpu_count()
except NotImplementedError:
    cpus = 2   # arbitrary default


def MyFilterFunction(x):
    if not isOlderThanXDays(x, XDays, DtToday):
        return x
    return None

pool = multiprocessing.Pool(processes=cpus)
parallelized = pool.map(MyFilterFunction, someList)
newList = [x for x in parallelized if x]
like image 112
Mars Avatar answered Nov 18 '25 23:11

Mars


you can use ThreadPool:

from multiprocessing.pool import ThreadPool # Class which supports an async version of applying functions to arguments
from functools import partial

NUMBER_CALLS_SAME_TIME = 10 # take care to avoid throttling
# Asume that isOlderThanXDays signature is isOlderThanXDays(x, XDays, DtToday)
my_api_call_func = partial(isOlderThanXDays, XDays=XDays, DtToday=DtToday)
pool = ThreadPool(NUMBER_CALLS_SAME_TIME)
responses = pool.map(my_api_call_func, someList)
like image 27
kederrac Avatar answered Nov 18 '25 23:11

kederrac



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!