Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I convert a multiprocessor.manager.list to a pure python list

I have my normal script doing about 30,000 records in 20 seconds. Given that amount of data that I have to run through (over 50 million records), I thought it wise to use python's multiprocessing.

At the end of my process, I do a database update using sqlalchemy core where I update the processed records in batches of 50,000. SQLAlchemy Core requires that you pass it a list for it to do a bulk update or even insert. I'll call this list py_list

For Python's multiprocessing, am capturing the results of the processes via a multiprocessing.manager.list() which I will call mp_list.

Everything works fine till the point that I pass the mp_list to the SQLAlchemy bulk update statement. This fails with the error AttributeError: 'list' object has no attribute 'keys'. Googling brings me to a question on SO which states that the multiprocessing.manager.list() and even a multiprocessing.manager.dict() is/are not a true python lists/dictionaries.

Question then is, how do I convert the multiprocessing.manager.list into a true python list.

mp_list is populated as follows:

import multiprocessing
manager = multiprocessing.Manager()
mp_list = manager.list()

def populate_mp_list(pid, is_processed):
    '''Mark the record as having been processed'''
    dict = {}
    dict['b_id'] = pid
    dict['is_processed'] = is_processed
    mp_list.append(dict)

The SQLALchemy code throwing the error is as follows:

CONN = Engine.connect()
trans = CONN.begin()
stmt = mytable.update().where(mytable.c.id == bindparam('b_id')).\
values(is_processed=bindparam('is_processed'))
CONN.execute(stmt, mp_list)
trans.commit(

I've tried converting the mp_list into a true python list. The new list created works but the time penalty for its creation negates all the time saved in multiprocessing.

If I do a loop of the returned mp_list and create a new list.

y = []
for x in mp_list:
    y.append(x)

Also, if I do a "copy" of the mp_list, each copy adds a 3 seconds! penalty on average which ain't cool.

y = mp_list[0:len(mp_list)]

So, which would be the fastest way to convert the multiprocessing.manager.list into a list usable by SQLAlchemy Core?

like image 712
lukik Avatar asked Dec 18 '13 17:12

lukik


3 Answers

Hope i'm not late.

Doesn't this work?

pythonlist = list(mp_list)

Same thing works for dict too:-

pythondict = dict(mp_dict)
like image 160
lionel319 Avatar answered Nov 16 '22 23:11

lionel319


What is the performance of:

y = [x for x in mp_list]

?

like image 37
Mayur Patel Avatar answered Nov 17 '22 00:11

Mayur Patel


Easy solution is taken by using list.

result_list = list(proxy_list)
like image 45
Junior_K27 Avatar answered Nov 17 '22 00:11

Junior_K27