So I have a small and simple Redis database. It contains 136689 keys whose values are hash maps containing 27 fields. I'm accessing the table through the Python interface on the server node, and need to load about 1000-1500 values per call (eventually I will see around 10 requests per second). A simple call looks something like this:
# below keys is a list of approximately 1000 integers,
# not all of which are in the table
import redis
db = redis.StrictRedis(
host='127.0.0.1',
port=6379,
db=0,
socket_timeout=1,
socket_connection_timeout=1,
decode_responses=True
)
with db.pipeline() as pipe:
for key in keys:
pipe.hgetall(key)
results = zip(keys,pipe.execute())
The overall time for this is ~328ms with an average time per request of ~0.25ms.
Question: This is very slow for a small database and relatively few queries per second. Is there something wrong with my configuration or the way I'm calling the server? Can something be done to make this faster? I don't expect the table to get much bigger so I'm perfectly happy sacrificing disk space for speed.
Calling hget
on each key (without the pipeline) is slower (as expected) and reveals the the time distribution is bimodal. The smaller peak corresponds to keys that are not in the table and the bigger one corresponds to keys that are.
My conf file is as follows:
port 6379
daemonize yes
save ""
bind 127.0.0.1
tcp-keepalive 300
dbfilename mytable.rdb
dir .
rdbcompression yes
appendfsync no
no-appendfsync-on-rewrite yes
loglevel notice
I launch the server with:
> echo never > /sys/kernel/mm/transparent_hugepage/enabled
> redis-server myconf.conf
I've also measured the intrinsic latency with redis-cli --intrinsic-latency 100
which gives:
Max latency so far: 1 microseconds.
Max latency so far: 10 microseconds.
Max latency so far: 11 microseconds.
Max latency so far: 12 microseconds.
Max latency so far: 18 microseconds.
Max latency so far: 32 microseconds.
Max latency so far: 34 microseconds.
Max latency so far: 38 microseconds.
Max latency so far: 48 microseconds.
Max latency so far: 52 microseconds.
Max latency so far: 60 microseconds.
Max latency so far: 75 microseconds.
Max latency so far: 94 microseconds.
Max latency so far: 120 microseconds.
Max latency so far: 281 microseconds.
Max latency so far: 413 microseconds.
Max latency so far: 618 microseconds.
1719069639 total runs (avg latency: 0.0582 microseconds / 58.17 nanoseconds per run).
Worst run took 10624x longer than the average latency.
This suggests that I should be able to get much better latency. However, when I check the server latency with: > redis-cli --latency -h 127.0.0.1 -p 6379
I get min: 0, max: 2, avg: 0.26 (2475 samples)
This seems to suggest that ~0.25ms is the latency for my server, but that seems suggest that the latency per request I'm seeing from Python is the same as the CLI, but it all seems very very slow.
The hashmap associated to each key (after decoding) has a size of ~1200 bytes. So I ran the following benchmark
redis-benchmark -h 127.0.0.1 -p 6379 -d 1500 hmset hgetall myhash rand_int rand_string
====== hmset hgetall myhash rand_int rand_string ======
100000 requests completed in 1.45 seconds
50 parallel clients
1500 bytes payload
keep alive: 1
100.00% <= 1 milliseconds
100.00% <= 1 milliseconds
69060.77 requests per second
This seems to support that my latency is very high, but doesn't really tell me why.
Redis - Hash Hgetall Command Redis HGETALL command is used to get all the fields and values of the hash stored at the key. In the returned value, every field name is followed by its value, so the length of the reply is twice the size of the hash.
0 hGetAll returns an empty object instead of null if the key doesn't exist #2161. Closed.
Redis HMGET command is used to get the values associated with the specified fields in the hash stored at the key. If the field does not exist in Redis hash, then a nil value is returned.
One of the conclusion that I got from the way I was using Redis is that we should not store each transaction inside one hash. As in one transaction one hash.
For each hget request we have a network connexion that is slowing the query down.
I think the way Redis is design it would be faster to store everything inside one hash, As in all transaction under the same hash.
Furthermore the granular data could be store in each key:values as JSON.
The time I got to retrieve all hashes vs the time I got from retrieving all values stored inside one hash is for 140mb worth of data:
Instead of having 1 000 000 000 iteration (if you have 1 000 000 000 hashes) in your for iteration, here with the proposed solution you only have 1 (more if you can segregate your data based on an intrinsic value), hence reducing significantly the query time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With