How to design Redis data structures in order to perform queries similar to DB queries in redis?

Tags:

redis

I have tables like Job, JobInfo. And i want to perform queries like below -

"SELECT J.JobID FROM Job J, JobInfo B WHERE B.JobID = J.JobID AND BatchID=5850 AND B.Status=0 AND J.JobType<>2"

How shall i go about writing my redis data types so that i can map such queries in redis?

IF i try to map the rows of table job in a redis hash for e.g. (hash j jobid 1 status 2) & similarly the rows of table JobInfo in again a redis hash as (hash jinfo jobid 1 jobtype 3.)

So my tables can be a set of hashes. Job table can be set with entries JobSet:jobid & JobInfo table can be set with entries like JobInfoSet:jobid

But i am confused in when i will do a SINTER on JobSet & JobInfoSet. how am i going to query that hash to get keys? As in the hash content of set jobSet is not identical to hash content of table JobInfoSet (they may have different key value pair.

So what exactly am i going to get as an output of SINTER? And how am i going to query that output as key-value pair?

So the tables will be a collection of redis hashes

742

asked May 16 '16 09:05

user888270

2 Answers

Redis is not designed to structure the data in SQL way. Beside a in-memory key value store, it supports five types of data structures: Strings, Hashes, Lists, Sets and Sorted Sets. At high level this is a sufficient hint that Redis is designed to solve performance problems that arises due to high computation in relational data models. However, if you want to execute sql query in a in-memory structure, you may want to look at memsql.

answered Sep 23 '22 14:09

Asheesh

Let's break down the SQL statement into different components and I'll try to show how redis can accomplish various parts.

Select J.JobID, J.JobName from Job J;

We translate each row in "Job" into a hash in redis using the SQL primary index as the redis natural index in redis. For example: SQL

==JobId==|==Name==
123        Fred

Redis HSET Job:123 Name Fred which can be conceptualized as

Job-123 => {"Name":"Fred"}

Thus we can store columns as hash fields in redis

Let's say we do the same thing for JobInfo. Each JobInfo object has its own ID

JobInfo-876 => {"meta1": "some value", "meta2": "bla", "JobID": "123"}

In sql normally we would make a secondary index on JobInfo.JobID but in NoSql land we maintain our own secondary indexes.

Sorted Sets are great for this. Thus when we want to fetch JobInfo objects by some field, JobId in this case we can add it to a sorted set like this ZADD JobInfo-JobID 123 JobInfo-876

This results in a set with 1 element in it {JobInfo-876} which has a score of 123. I realize that forcing all JobIDs into the float range for the score is a bad idea, but work with me here.

Now when we want to find all JobInfo objects for a given JobID we just do a log(N) lookup into the index. ZRANGEBYSCORE JobInfo-JobID 123 123 which returns "JobInfo-876"

Now to implement simple joins we simply reuse this JobInfo-JobID index by storing Job keys by their JobIDs. ZADD JobInfo-JobID 123 Job-123

Thus when doing something akin to SELECT J.JobID, J.Name, B.meta1 FROM Job, JobInfo USING (JobID).

This would translate to scanning through the JobInfo-JobID secondary index and reorganizing the Job and JobInfo objects returned. ZRANGEBYSCORE JobInfo-JobID -inf +inf WITHSCORES 5 -> (Job-123, JobInfo-876)

These objects all share the same JobID. CLient side you'd then asynchronously fetch the needed fields. Or you could embed these lookups in a lua script. This lua script could make redis hang for a long time. Normally redis tries to be fair with clients and prefers you to have short batched queries instead of one long query.

Now we come to a big problem, what if we want to combine secondary indexes. Let's say we have a secondary index on JobInfo.Status, and another on Job.JobType. If we make a set of all jobs with the right JobType and use that as a filter on the JobInfo-JobID shared secondary index then we not only eliminate the bad Job elements but also every JobInfo element. We could, I guess fetch the scores(JobID) on the intersection and refetch all JobInfo objects with those scores, but we lose some of the filtering we did.

It is at this point where redis breaks down.

Here is an article on secondary indexes from the creator of redis himself: http://redis.io/topics/indexes He touches multi-dimensional indexes for filtering purposes. As you can see he designed the data structures in a very versatile way. One that is the most appealing is the fact that sorted set elements with the same score are stored in lexicographical order. Thus you can easily have all elements have a score of 0 and piggyback on Redis's speed and use it more like cockroachDB, which relies on a global order to implement many SQL features.

answered Sep 23 '22 14:09

Brian

Related questions
                            
                                Can I treat a subquery with one row and one column as a scalar?
                            
                                Convert comma separated string to a list
                            
                                Oracle SQL query efficiency Improvement
                            
                                Grouping data on SQL Server
                            
                                Postgres: Create Column and Update Column Values with Query Output
                            
                                MySQL Winning Streak for every Player
                            
                                Does it matter if i write "INTEGER" or "int" in sql command inside java?[sqlite]
                            
                                Update a Column from another Column using SQLite?
                            
                                Crosstab function in Postgres returning a one row output when I expect multiple rows
                            
                                MySQL join and COUNT() on multiple tables
                            
                                millisecond in sql tsql
                            
                                Find total records in various tables in a single query
                            
                                Search for the occurrence of a list of values
                            
                                SQLAlchemy column type comparison
                            
                                How to use LINQ to get multiple totals
                            
                                Matching similar entities based on many to many relationship
                            
                                How to guarantee that at least N rows are returned by recursive CTE in Postgres
                            
                                Returning ids of a table where all values of other table exist with this id using all() or exists()
                            
                                What is the best practice database design for transactions aggregation?
                            
                                Add constraint to unique row with more than 16 columns

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With