I have a db with about 350,000 keys. Currently my code just loops through all keys and gets its value from the db.
However this takes almost 2 minutes to do, which seems really slow, redis-benchmark
gave 100k reqs/3s.
I've looked at pipelining but I need each value returned so that I end up with a dict of key, value pairs.
At the moment I'm thinking of using threading in my code if possible to speed this up, is this the best way to handle this usecase?
Here's the code I have so far.
import redis, timeit
start_time = timeit.default_timer()
count = redis.Redis(host='127.0.0.1', port=6379, db=9)
keys = count.keys()
data = {}
for key in keys:
value = count.get(key)
if value:
data[key.decode('utf-8')] = int(value.decode('utf-8'))
elapsed = timeit.default_timer() - start_time
print('Time to read {} records: '.format(len(keys)), elapsed)
To list the keys in the Redis data store, use the KEYS command followed by a specific pattern. Redis will search the keys for all the keys matching the specified pattern. In our example, we can use an asterisk (*) to match all the keys in the data store to get all the keys.
In Redis, the GET command is typically used to return the value of a single key that holds a string. But what if we need the values from multiple keys? We can use the MGET command.
The Redis KEYS command returns all the keys in the database that match a pattern (or all the keys in the key space). Similar commands for fetching all the fields stored in a hash is HGETALL and for all fetching the members of a SMEMBERS. The keys in Redis themselves are stored in a dictionary (aka a hash table).
To clear data of a DCS Redis 4.0 or 5.0 instance, you can run the FLUSHDB or FLUSHALL command in redis-cli, use the data clearing function on the DCS console, or run the FLUSHDB command on Web CLI. To clear data of a Redis Cluster instance, run the FLUSHDB or FLUSHALL command on every shard of the instance.
First, the fastest way is doing all of this inside EVAL.
Next, recommended approach to iterate all keys is SCAN. It would not iterate faster than KEYS
, but will allow Redis to process some other actions in between, so it will help with overall application behavior.
The script will be something like local data={} local i=1 local mykeys=redis.call(\"KEYS\",\"*\") for k=1,#mykeys do local tmpkey=mykeys[k] data[i]={tmpkey,redis.call(\"GET\",tmpkey)} i=i+1 end return data
, but it will fail if you have keys inaccessible with GET (like sets, lists). You need to add error handling to it. If you need sorting, you can do it either in LUA directly, or later on the client side. The second will be slower, but would not let other users of redis instance wait.
Sample output:
127.0.0.1:6370> eval "local data={} local i=1 local mykeys=redis.call(\"KEYS\",\"*\") for k=1,#mykeys do local tmpkey=mykeys[k] data[i]={tmpkey,redis.call(\"GET\",tmpkey)} i=i+1 end return data" 0
1) 1) "a"
2) "aval"
2) 1) "b"
2) "bval"
3) 1) "c"
2) "cval"
4) 1) "d"
2) "dval"
5) 1) "e"
2) "eval"
6) 1) "f"
2) "fval"
7) 1) "g"
2) "gval"
8) 1) "h"
2) "hval"
I had the same problem and ended up usingKEYS
and MGET
to iterate multiple keys at the same time:
import redis
url='redis://my.redis.url'
query='product:*'
client = redis.StrictRedis.from_url(url, decode_responses=True)
keys = client.keys(query)
def chunks(lst, n):
for i in range(0, len(lst), n):
yield lst[i:i + n]
partitions = list(chunks(keys, 10000))
data = []
for keys in partitions:
values = client.mget(keys)
data.extend(zip(keys, values))
print(len(data))
I've written a blog on showing progress while writing the result to a file.
This code is the base for the redis-mass-get Python package. It could be used to do the same, like this:
from redis_mass_get import RedisQuery
# pluralize will return the result or None
q = RedisQuery("redis://my.amazing.redis.url")
# query data
data = q.query("product:*")
# data is returned as:
# [(key1, value1), (key2, value2)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With