Python/Redis Multiprocessing

Tags:

I'm using Pool.map from the multiprocessing library to iterate through a large XML file and save word and ngram counts into a set of three redis servers. (which sit completely in memory) But for some reason all 4 cpu cores sit around 60% idle the whole time. The server has plenty of RAM and iotop shows that there is no disk IO happening.

I have 4 python threads and 3 redis servers running as daemons on three different ports. Each Python thread connects to all three servers.

The number of redis operations on each server is well below what it's benchmarked as capable of.

I can't find the bottleneck in this program? What would be likely candidates?

795

asked Dec 14 '10 17:12

albertsun

1 Answers

Network latency may be contributing to your idle CPU time in your python client application. If the network latency between client to server is even as little as 2 milliseconds, and you perform 10,000 redis commands, your application must sit idle for at least 20 seconds, regardless of the speed of any other component.

Using multiple python threads can help, but each thread will still go idle when a blocking command is sent to the server. Unless you have very many threads, they will often synchronize and all block waiting for a response. Because each thread is connecting to all three servers, the chances of this happening are reduced, except when all are blocked waiting for the same server.

Assuming you have uniform random distributed access across the servers to service your requests (by hashing on key names to implement sharding or partitioning), then the odds that three random requests will hash to the same redis server is inversely proportional to the number of servers. For 1 server, 100% of the time you will hash to the same server, for 2 it's 50% of the time, for 3 it's 33% of the time. What may be happening is that 1/3 of the time, all of your threads are blocked waiting for the same server. Redis is a single-threaded at handling data operations, so it must process each request one after another. Your observation that your CPU only reaches 60% utilization agrees with the probability that your requests are all blocked on network latency to the same server.

Continuing the assumption that you are implementing client-side sharding by hashing on key names, you can eliminate the contention between threads by assigning each thread a single server connection, and evaluate the partitioning hash before passing a request to a worker thread. This will ensure all threads are waiting on different network latency. But there may be an even better improvement by using pipelining.

You can reduce the impact of network latency by using the pipeline feature of the redis-py module, if you don't need an immediate result from the server. This may be viable for you, since you are storing the results of data processing into redis, it seems. To implent this using redis-py, periodically obtain a pipeline handle to an existing redis connection object using the .pipeline() method and invoke multiple store commands against that new handle the same as you would for the primary redis.Redis connection object. Then invoke .execute() to block on the replies. You can get orders of magnitude improvement by using pipelining to batch tens or hundreds of commands together. Your client thread won't block until you issue the final .execute() method on the pipeline handle.

If you apply both changes, and each worker thread communicates to just one server, pipelining multiple commands together (at least 5-10 to see a significant result), you may see greater CPU usage in the client (nearer to 100%). The cpython GIL will still limit the client to one core, but it sounds like you are already using other cores for the XML parsing by using the multiprocessing module.

There is a good writeup about pipelining on the redis.io site.

183

answered Oct 18 '22 04:10

Will Pierce

Related questions
                            
                                Is there anything wrong with creating a Python Pickle powered website?
                            
                                Steps on howto install PySide on windows
                            
                                GAE " no attribute 'HTTPSHandler' " dev_appserver.py
                            
                                clr.AddReferenceToFile() fails in IronPython 2.7
                            
                                QSortFilterProxyModel returning artificial row
                            
                                Open source Twitter clone (in Ruby/Python) [closed]
                            
                                Why does Tkinter frame resize when text box is added to it?
                            
                                Django not cascading on delete
                            
                                How do you bind Home/End to act like Cmd-Left/Cmd-Right in Eclipse on Mac?
                            
                                timeout a subprocess
                            
                                boost::python: howto call a function that expects a pointer?
                            
                                What's the fastest way to remove duplicate lines in a txt file(and also some lines which contain specific strings) using python?
                            
                                Delayed loading of modules in python
                            
                                Overriding authenticate method - Django admin
                            
                                How to speed up this Python code?
                            
                                Clearing background in matplotlib using wxPython
                            
                                How can i access the current user outside a request in django
                            
                                TypeError: compile() expected string without null bytes
                            
                                Run multiple programs sequentially in one Windows command prompt?
                            
                                Fabric asks for root password

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python/Redis Multiprocessing

Tags:

python

redis

multiprocessing

albertsun

People also ask

1 Answers

Will Pierce

Recent Activity

Donate For Us