I have implemented one-producer-multiple-consumer pattern using Python's multiprocessing package. The consumers should put the results in a dictionary. The keys of this dictionary are words and the values are big Scipy sparse matrix. Each consumer adds its value for each word it sees to the main vector for that word in the shared dictionary.
I have used Manager.dict() to implement this shared dictionary but it is very slow. cpu-utilization is about 15% for each process and it is just a little bit better than a single process. Each consumer fetches an item from the shared dictionary, adds a sparse matrix to the value of that item and updates the item in the shared dictionary.
Is there any more efficient solution?
import memcache
memc = memcache.Client(['127.0.0.1:11211'], debug=1);
memc.set('top10candytypes', {1 : 2, "3" : [4,5,6]})
bestCandy = memc.get('top10candytypes')
print(bestCandy)
I'm no expert on memcache because i've just started to use it myself. But it's handy as hell if you have multiple threads needing to access the same data or if you simply need to store things efficiently without running out of ram.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With