Fastest way to store a numpy array in redis

Tags:

I'm using redis on an AI project.

The idea is to have multiple environment simulators running policies on a lot of cpu cores. The simulators write experience (a list of state/action/reward tuples) to a redis server (replay buffer). Then a training process reads the experience as a dataset to generate a new policy. New policy is deployed to the simulators, data from previous run is deleted, and the process continues.

The bulk of the experience is captured in the "state". Which is normally represented as a large numpy array of dimension say, 80 x 80. The simulators generate these as fast as the cpu will allow.

To this end, does anyone have good ideas or experience of the best/fastest/simplest way to write a lot of numpy arrays to redis. This is all on the same machine, but later, could be on a set of cloud servers. Code samples welcome!

615

asked Mar 23 '19 06:03

Duane

Video Answer

4 Answers

I don't know if it is fastest, but you could try something like this...

Storing a Numpy array to Redis goes like this - see function toRedis():

get shape of Numpy array and encode
append the Numpy array as bytes to the shape
store the encoded array under supplied key

Retrieving a Numpy array goes like this - see function fromRedis():

retrieve from Redis the encoded string corresponding to supplied key
extract the shape of the Numpy array from the string
extract data and repopulate Numpy array, reshape to original shape

#!/usr/bin/env python3

import struct
import redis
import numpy as np

def toRedis(r,a,n):
   """Store given Numpy array 'a' in Redis under key 'n'"""
   h, w = a.shape
   shape = struct.pack('>II',h,w)
   encoded = shape + a.tobytes()

   # Store encoded data in Redis
   r.set(n,encoded)
   return

def fromRedis(r,n):
   """Retrieve Numpy array from Redis key 'n'"""
   encoded = r.get(n)
   h, w = struct.unpack('>II',encoded[:8])
   # Add slicing here, or else the array would differ from the original
   a = np.frombuffer(encoded[8:]).reshape(h,w)
   return a

# Create 80x80 numpy array to store
a0 = np.arange(6400,dtype=np.uint16).reshape(80,80) 

# Redis connection
r = redis.Redis(host='localhost', port=6379, db=0)

# Store array a0 in Redis under name 'a0array'
toRedis(r,a0,'a0array')

# Retrieve from Redis
a1 = fromRedis(r,'a0array')

np.testing.assert_array_equal(a0,a1)

You could add more flexibility by encoding the dtype of the Numpy array along with the shape. I didn't do that because it may be the case that you already know all your arrays are of one specific type and then the code would just be bigger and harder to read for no reason.

Rough benchmark on modern iMac:

80x80 Numpy array of np.uint16   => 58 microseconds to write
200x200 Numpy array of np.uint16 => 88 microseconds to write

Keywords: Python, Numpy, Redis, array, serialise, serialize, key, incr, unique

answered Oct 27 '22 07:10

Mark Setchell

You could also consider using msgpack-numpy, which provides "encoding and decoding routines that enable the serialization and deserialization of numerical and array data types provided by numpy using the highly efficient msgpack format." -- see https://msgpack.org/.

Quick proof-of-concept:

import msgpack
import msgpack_numpy as m
import numpy as np
m.patch()               # Important line to monkey-patch for numpy support!

from redis import Redis

r = Redis('127.0.0.1')

# Create an array, then use msgpack to serialize it 
d_orig = np.array([1,2,3,4])
d_orig_packed = m.packb(d_orig)

# Set the data in redis
r.set('d', d_orig_packed)

# Retrieve and unpack the data
d_out = m.unpackb(r.get('d'))

# Check they match
assert np.alltrue(d_orig == d_out)
assert d_orig.dtype == d_out.dtype

On my machine, msgpack runs much quicker than using struct:

In: %timeit struct.pack('4096L', *np.arange(0, 4096))
1000 loops, best of 3: 443 µs per loop

In: %timeit m.packb(np.arange(0, 4096))
The slowest run took 7.74 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 32.6 µs per loop

answered Oct 27 '22 08:10

telegraphic

You can check Mark Setchell's answer for how to actually write the bytes into Redis. Below I rewrite the functions fromRedis and toRedis to account for arrays of variable dimension size and to also include the array shape.

def toRedis(arr: np.array) -> str:
    arr_dtype = bytearray(str(arr.dtype), 'utf-8')
    arr_shape = bytearray(','.join([str(a) for a in arr.shape]), 'utf-8')
    sep = bytearray('|', 'utf-8')
    arr_bytes = arr.ravel().tobytes()
    to_return = arr_dtype + sep + arr_shape + sep + arr_bytes
    return to_return

def fromRedis(serialized_arr: str) -> np.array:
    sep = '|'.encode('utf-8')
    i_0 = serialized_arr.find(sep)
    i_1 = serialized_arr.find(sep, i_0 + 1)
    arr_dtype = serialized_arr[:i_0].decode('utf-8')
    arr_shape = tuple([int(a) for a in serialized_arr[i_0 + 1:i_1].decode('utf-8').split(',')])
    arr_str = serialized_arr[i_1 + 1:]
    arr = np.frombuffer(arr_str, dtype = arr_dtype).reshape(arr_shape)
    return arr

answered Oct 27 '22 08:10

Jadiel de Armas

Give plasma a try as it avoids serialization/deserialization overhead.

Install plasma using pip install pyarrow

Documentation: https://arrow.apache.org/docs/python/plasma.html

firstly, launch plasma with 1 gb memory[terminal]:

plasma_store -m 1000000000 -s /tmp/plasma

import pyarrow.plasma as pa
import numpy as np
client = pa.connect("/tmp/plasma")
temp = np.random.rand(80,80)

Write time: 130 µs vs 782 µs (Redis implementation: Mark Setchell's answer)

Write time can be improved by using plasma huge pages but is available only for Linux machines : https://arrow.apache.org/docs/python/plasma.html#using-plasma-with-huge-pages

Fetch time: 31.2 µs vs 99.5 µs (Redis implementation: Mark Setchell's answer)

PS: Code was run on a MacPro

answered Oct 27 '22 07:10

Abhishek Sharma

Related questions
                            
                                Is there a difference between a placeholder and variable when not building a model?
                            
                                Python, I'm repeating myself a lot when it comes to for loops and there must be a better way
                            
                                Python - Timer with asyncio/coroutine
                            
                                Using endswith with case insensivity in python
                            
                                Removing a nan from a list
                            
                                AttributeError: __enter__ from "with tf.Session as sess:"
                            
                                How to run Azure CLI commands using python?
                            
                                pandas get unique values from column of lists
                            
                                Faster way to sum a list of numbers than with a for-loop?
                            
                                Perl equivalent of (Python-) list comprehension
                            
                                Pythonic way to ignore for loop control variable
                            
                                getting Ceil() of Decimal in python?
                            
                                Sane way to define default variable values from within a jinja template?
                            
                                Exit while loop by user hitting ENTER key
                            
                                Creating Charts & Graphs with Python [closed]
                            
                                Multiplication of 1d arrays in numpy
                            
                                Regular expression: Match everything after a particular word
                            
                                Convert binary string to bytearray in Python 3
                            
                                Color matplotlib bar chart based on value
                            
                                How can I get current base URI in flask? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fastest way to store a numpy array in redis

Tags:

python

artificial-intelligence

redis

numpy