I have my normal script doing about 30,000 records in 20 seconds. Given that amount of data that I have to run through (over 50 million records), I thought it wise to use python's multiprocessing.
At the end of my process, I do a database update using sqlalchemy core where I update the processed records in batches of 50,000. SQLAlchemy Core requires that you pass it a list for it to do a bulk update or even insert. I'll call this list py_list
For Python's multiprocessing, am capturing the results of the processes via a multiprocessing.manager.list()
which I will call mp_list
.
Everything works fine till the point that I pass the mp_list
to the SQLAlchemy bulk update statement. This fails with the error AttributeError: 'list' object has no attribute 'keys'
. Googling brings me to a question on SO which states that the multiprocessing.manager.list() and even a multiprocessing.manager.dict() is/are not a true python lists/dictionaries.
Question then is, how do I convert the multiprocessing.manager.list into a true python list.
mp_list
is populated as follows:
import multiprocessing
manager = multiprocessing.Manager()
mp_list = manager.list()
def populate_mp_list(pid, is_processed):
'''Mark the record as having been processed'''
dict = {}
dict['b_id'] = pid
dict['is_processed'] = is_processed
mp_list.append(dict)
The SQLALchemy code throwing the error is as follows:
CONN = Engine.connect()
trans = CONN.begin()
stmt = mytable.update().where(mytable.c.id == bindparam('b_id')).\
values(is_processed=bindparam('is_processed'))
CONN.execute(stmt, mp_list)
trans.commit(
I've tried converting the mp_list into a true python list. The new list created works but the time penalty for its creation negates all the time saved in multiprocessing.
If I do a loop of the returned mp_list
and create a new list.
y = []
for x in mp_list:
y.append(x)
Also, if I do a "copy" of the mp_list
, each copy adds a 3 seconds! penalty on average which ain't cool.
y = mp_list[0:len(mp_list)]
So, which would be the fastest way to convert the multiprocessing.manager.list into a list usable by SQLAlchemy Core?
Hope i'm not late.
Doesn't this work?
pythonlist = list(mp_list)
Same thing works for dict too:-
pythondict = dict(mp_dict)
What is the performance of:
y = [x for x in mp_list]
?
Easy solution is taken by using list.
result_list = list(proxy_list)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With