Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Faster way to iterate all keys and values in redis db

I have a db with about 350,000 keys. Currently my code just loops through all keys and gets its value from the db.

However this takes almost 2 minutes to do, which seems really slow, redis-benchmark gave 100k reqs/3s.

I've looked at pipelining but I need each value returned so that I end up with a dict of key, value pairs.

At the moment I'm thinking of using threading in my code if possible to speed this up, is this the best way to handle this usecase?

Here's the code I have so far.

import redis, timeit
start_time = timeit.default_timer()
count = redis.Redis(host='127.0.0.1', port=6379, db=9)
keys = count.keys()

data = {}

for key in keys:
    value = count.get(key)
    if value:
        data[key.decode('utf-8')] = int(value.decode('utf-8'))

elapsed = timeit.default_timer() - start_time

print('Time to read {} records: '.format(len(keys)), elapsed)
like image 328
Jonathan Avatar asked May 23 '18 10:05

Jonathan


People also ask

How do I get all Redis values?

To list the keys in the Redis data store, use the KEYS command followed by a specific pattern. Redis will search the keys for all the keys matching the specified pattern. In our example, we can use an asterisk (*) to match all the keys in the data store to get all the keys.

How fetch multiple keys Redis?

In Redis, the GET command is typically used to return the value of a single key that holds a string. But what if we need the values from multiple keys? We can use the MGET command.

Which command is used to obtain all the keys in a Redis database?

The Redis KEYS command returns all the keys in the database that match a pattern (or all the keys in the key space). Similar commands for fetching all the fields stored in a hash is HGETALL and for all fetching the members of a SMEMBERS. The keys in Redis themselves are stored in a dictionary (aka a hash table).

How do I flush all Redis?

To clear data of a DCS Redis 4.0 or 5.0 instance, you can run the FLUSHDB or FLUSHALL command in redis-cli, use the data clearing function on the DCS console, or run the FLUSHDB command on Web CLI. To clear data of a Redis Cluster instance, run the FLUSHDB or FLUSHALL command on every shard of the instance.


2 Answers

First, the fastest way is doing all of this inside EVAL.

Next, recommended approach to iterate all keys is SCAN. It would not iterate faster than KEYS, but will allow Redis to process some other actions in between, so it will help with overall application behavior.

The script will be something like local data={} local i=1 local mykeys=redis.call(\"KEYS\",\"*\") for k=1,#mykeys do local tmpkey=mykeys[k] data[i]={tmpkey,redis.call(\"GET\",tmpkey)} i=i+1 end return data, but it will fail if you have keys inaccessible with GET (like sets, lists). You need to add error handling to it. If you need sorting, you can do it either in LUA directly, or later on the client side. The second will be slower, but would not let other users of redis instance wait.

Sample output:

127.0.0.1:6370> eval "local data={} local i=1 local mykeys=redis.call(\"KEYS\",\"*\") for k=1,#mykeys do local tmpkey=mykeys[k] data[i]={tmpkey,redis.call(\"GET\",tmpkey)} i=i+1 end return data" 0
1) 1) "a"
   2) "aval"
2) 1) "b"
   2) "bval"
3) 1) "c"
   2) "cval"
4) 1) "d"
   2) "dval"
5) 1) "e"
   2) "eval"
6) 1) "f"
   2) "fval"
7) 1) "g"
   2) "gval"
8) 1) "h"
   2) "hval"
like image 118
Imaskar Avatar answered Oct 05 '22 02:10

Imaskar


I had the same problem and ended up usingKEYS and MGET to iterate multiple keys at the same time:

import redis
url='redis://my.redis.url'
query='product:*'

client = redis.StrictRedis.from_url(url, decode_responses=True)
keys = client.keys(query)

def chunks(lst, n):
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

partitions = list(chunks(keys, 10000))

data = []
for keys in partitions:
    values = client.mget(keys)
    data.extend(zip(keys, values))

print(len(data))

I've written a blog on showing progress while writing the result to a file.

This code is the base for the redis-mass-get Python package. It could be used to do the same, like this:

from redis_mass_get import RedisQuery

# pluralize will return the result or None
q = RedisQuery("redis://my.amazing.redis.url")

# query data 
data = q.query("product:*")
# data is returned as:
# [(key1, value1), (key2, value2)]
like image 21
Kees C. Bakker Avatar answered Oct 05 '22 02:10

Kees C. Bakker