Redis slow query with pipelined hgetall

Tags:

So I have a small and simple Redis database. It contains 136689 keys whose values are hash maps containing 27 fields. I'm accessing the table through the Python interface on the server node, and need to load about 1000-1500 values per call (eventually I will see around 10 requests per second). A simple call looks something like this:

# below keys is a list of approximately 1000 integers, 
# not all of which are in the table

import redis
db = redis.StrictRedis(
  host='127.0.0.1',
  port=6379,
  db=0,
  socket_timeout=1,
  socket_connection_timeout=1,
  decode_responses=True
)

with db.pipeline() as pipe:
  for key in keys: 
    pipe.hgetall(key)
  results = zip(keys,pipe.execute())

The overall time for this is ~328ms with an average time per request of ~0.25ms.

Question: This is very slow for a small database and relatively few queries per second. Is there something wrong with my configuration or the way I'm calling the server? Can something be done to make this faster? I don't expect the table to get much bigger so I'm perfectly happy sacrificing disk space for speed.

Additional Information

Calling hget on each key (without the pipeline) is slower (as expected) and reveals the the time distribution is bimodal. The smaller peak corresponds to keys that are not in the table and the bigger one corresponds to keys that are.

Redis response time distribution

My conf file is as follows:

port 6379
daemonize yes 
save ""
bind 127.0.0.1
tcp-keepalive 300 
dbfilename mytable.rdb
dir .
rdbcompression yes 

appendfsync no
no-appendfsync-on-rewrite yes 
loglevel notice

I launch the server with:

> echo never > /sys/kernel/mm/transparent_hugepage/enabled
> redis-server myconf.conf

I've also measured the intrinsic latency with redis-cli --intrinsic-latency 100 which gives:

Max latency so far: 1 microseconds.
Max latency so far: 10 microseconds.
Max latency so far: 11 microseconds.
Max latency so far: 12 microseconds.
Max latency so far: 18 microseconds.
Max latency so far: 32 microseconds.
Max latency so far: 34 microseconds.
Max latency so far: 38 microseconds.
Max latency so far: 48 microseconds.
Max latency so far: 52 microseconds.
Max latency so far: 60 microseconds.
Max latency so far: 75 microseconds.
Max latency so far: 94 microseconds.
Max latency so far: 120 microseconds.
Max latency so far: 281 microseconds.
Max latency so far: 413 microseconds.
Max latency so far: 618 microseconds.

1719069639 total runs (avg latency: 0.0582 microseconds / 58.17 nanoseconds per run).
Worst run took 10624x longer than the average latency.

This suggests that I should be able to get much better latency. However, when I check the server latency with: > redis-cli --latency -h 127.0.0.1 -p 6379 I get min: 0, max: 2, avg: 0.26 (2475 samples)

This seems to suggest that ~0.25ms is the latency for my server, but that seems suggest that the latency per request I'm seeing from Python is the same as the CLI, but it all seems very very slow.

The hashmap associated to each key (after decoding) has a size of ~1200 bytes. So I ran the following benchmark

redis-benchmark -h 127.0.0.1 -p 6379 -d 1500 hmset hgetall myhash rand_int rand_string
====== hmset hgetall myhash rand_int rand_string ======
  100000 requests completed in 1.45 seconds
  50 parallel clients
  1500 bytes payload
  keep alive: 1

100.00% <= 1 milliseconds
100.00% <= 1 milliseconds
69060.77 requests per second

This seems to support that my latency is very high, but doesn't really tell me why.

963

asked Oct 31 '19 22:10

Paul

1 Answers

One of the conclusion that I got from the way I was using Redis is that we should not store each transaction inside one hash. As in one transaction one hash.

For each hget request we have a network connexion that is slowing the query down.

I think the way Redis is design it would be faster to store everything inside one hash, As in all transaction under the same hash.

Furthermore the granular data could be store in each key:values as JSON.

The time I got to retrieve all hashes vs the time I got from retrieving all values stored inside one hash is for 140mb worth of data:

3 seconds of iterating each hash and getting its key:values vs
0,008 seconds for getting one hash and then searching a key:value inside that hash, vs
0,008 seconds to get all data stored under one hash.

Instead of having 1 000 000 000 iteration (if you have 1 000 000 000 hashes) in your for iteration, here with the proposed solution you only have 1 (more if you can segregate your data based on an intrinsic value), hence reducing significantly the query time.

answered Sep 18 '22 19:09

Pelican

Related questions
                            
                                Using Panda's .at function to modify multiple rows
                            
                                Python pytest pytest_exception_interact customize exception information from VCR.py exception
                            
                                How to hide command prompt popup during launching PyLatex or Latexmk
                            
                                How to document options in an INI file with Sphinx
                            
                                Recommendation system with matrix factorization for huge data gives MemoryError
                            
                                How to provide multiple targets to a Seq2Seq model?
                            
                                RuntimeError: Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment
                            
                                How do you enable macOS Dark Mode in PyQt5 (5.13)
                            
                                Generating correlated random potential using fast Fourier transform
                            
                                Using numerical values in plotly for creating Gantt-Charts
                            
                                Bool and missing values in pandas
                            
                                How to crawl in desired order or Synchronously in Scrapy?
                            
                                Type hinting values that are multiple types?
                            
                                Why is the 'all' implementation slower than writing a for loop [duplicate]
                            
                                How to combine polynomials in matrix operations in Sympy?
                            
                                Use tf-idf with FastText vectors
                            
                                Making a fullscreen paint program with transparent background of my application
                            
                                ModuleNotFoundError: No module named 'fasttext'
                            
                                How to use a .SAS or SPS metadata file to read a CSV as a Pandas dataframe?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Redis slow query with pipelined hgetall

Tags:

performance

python

redis

database-performance

query-performance

Additional Information

Paul

People also ask

1 Answers

Pelican

Recent Activity

Donate For Us