Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I need an efficient shared dictionary in a Python multiprocessing environment

I have implemented one-producer-multiple-consumer pattern using Python's multiprocessing package. The consumers should put the results in a dictionary. The keys of this dictionary are words and the values are big Scipy sparse matrix. Each consumer adds its value for each word it sees to the main vector for that word in the shared dictionary.

I have used Manager.dict() to implement this shared dictionary but it is very slow. cpu-utilization is about 15% for each process and it is just a little bit better than a single process. Each consumer fetches an item from the shared dictionary, adds a sparse matrix to the value of that item and updates the item in the shared dictionary.

Is there any more efficient solution?

like image 446
Ash Avatar asked Mar 27 '14 10:03

Ash


1 Answers

import memcache

memc = memcache.Client(['127.0.0.1:11211'], debug=1);
memc.set('top10candytypes', {1 : 2, "3" : [4,5,6]})

bestCandy = memc.get('top10candytypes')
print(bestCandy)

I'm no expert on memcache because i've just started to use it myself. But it's handy as hell if you have multiple threads needing to access the same data or if you simply need to store things efficiently without running out of ram.

like image 136
Torxed Avatar answered Sep 22 '22 02:09

Torxed