Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Redis handle this simple query

I've been investigating Redis functional capabilities in compare to relational database, without getting into NFR issues such as response time, scalability etc. in which I understand that Redis excel.

Here is for example a list of use-cases that Redis can handle for web-applications.
Having mention so, one known disadvantage of Redis is for doing business analytic, but how complex the analytic should be in order to make Redis less efficient in compare for example to MySQL?

For example if the following data structure in MySQL:
Table: User Columns: Id(PK), Name(VarChar), Age(Int)
Table: Message Columns: UserID(FK), Content(VarChar), Importance(Int)

and in my application I want to use the following 2 queries:

 1. SELECT Content FROM Message WHERE Importance > 2;
 2. SELECT Content FROM Message,Users WHERE  User.Id=Message.UserID and
    User.Age > 30;

My Question:
Can I use Redis to store the Datastructure above and query it in the same (or more) efficiency as in MySQL?

like image 534
GyRo Avatar asked Feb 11 '23 08:02

GyRo


1 Answers

Short answer: yes.

Long answer: Redis is an amazing piece of technology but it is not a relational database. NoSQLs, with Redis included, are built on the premise that data needs to be stored according to the access patterns used with it. Therefore, to accomplish the above you'll first have to store the data "correctly".

To store your tables' rows, it appears that you'll want to use the Hash data structure. In Redis' terminology, here's how you'd create a User key for UserID 123:

HMSET user:123 id 123 name foo age 31

Note 1: the use of a colon (':') in constructing the key's name is merely a convention.
Note 2: while the ID is already a part of the key's name, it is common to include it a field in the Hash for easier access.

Similarly, here's how you'll create a Message key (with the ID 987):

HMSET message:987 id 987 userid 123 content bar importance 3

Now comes the fun part :) Redis doesn't have FKs or indices, so you'll have to maintain data structures that will assist you in fetching the data per your requirements. For your first query, the best choice is keeping a Sorted Set in which the members are the message IDs and the scores are the importance. Therefore do:

ZADD messages_by_importance 3 987

Fetching messages' content with importance greater than 2 will be done with two operations as shown by this pseudo-Pythonic code:

messages = r.zrangebyscore('messages_by_importance', '(2', '+inf')
for msg in messages:
    content = r.hget('message:' + msg, 'content')
    do_something(content)

Note 3: this snippet is quite naive and can be optimized for better performance, but it should provide you with the basic gist.

For the second query, you'll first need to find users who are older than 30 year - again, the same Sorted Set trick should be used:

ZADD users_by_age 31 123
ZRANGEBYSCORE users_by_age (30 +inf

This will get you the list of all users that match your criterion, but you'll also need to keep track (index) of all messages per user. To do this, use a Set:

SADD user:123:messages 987

To tie everything, here's another pseudo-snippet:

users = r.zrangebyscore('users_by_age', '(30', '+inf')
for user in users:
    messages = r.smembers('user:' + user + ':messages')
    for msg in messages:
        content = r.hget('message:' + msg, 'content')
        do_something(content)

This should be enough to get you started but once you've got a firm grip on the basics, look into optimizing these flows. Easy gains can be gotten with the use of pipelining, Lua scripting and smarter indices according to your needs... and if you need any further assistance - just ask :)

like image 165
Itamar Haber Avatar answered Feb 20 '23 00:02

Itamar Haber