Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to design Redis data structures in order to perform queries similar to DB queries in redis?

Tags:

sql

redis

I have tables like Job, JobInfo. And i want to perform queries like below -

"SELECT J.JobID FROM Job J, JobInfo B WHERE B.JobID = J.JobID AND BatchID=5850 AND B.Status=0 AND J.JobType<>2"

How shall i go about writing my redis data types so that i can map such queries in redis?

IF i try to map the rows of table job in a redis hash for e.g. (hash j jobid 1 status 2) & similarly the rows of table JobInfo in again a redis hash as (hash jinfo jobid 1 jobtype 3.)

So my tables can be a set of hashes. Job table can be set with entries JobSet:jobid & JobInfo table can be set with entries like JobInfoSet:jobid

But i am confused in when i will do a SINTER on JobSet & JobInfoSet. how am i going to query that hash to get keys? As in the hash content of set jobSet is not identical to hash content of table JobInfoSet (they may have different key value pair.

So what exactly am i going to get as an output of SINTER? And how am i going to query that output as key-value pair?

So the tables will be a collection of redis hashes

like image 742
user888270 Avatar asked May 16 '16 09:05

user888270


People also ask

What data structure does Redis use?

Redis supports Pub/Sub with pattern matching and a variety of data structures such as lists, sorted sets, and hashes.

What query language does Redis use?

Redis is an open-source, in-memory key-value data store. A NoSQL database, Redis doesn't use structured query language, otherwise known as SQL. Redis instead comes with its own set of commands for managing and accessing data.

How would you design a Redis key?

While short keys will obviously consume a bit less memory, your job is to find the right balance. Try to stick with a schema. For instance "object-type:id" is a good idea, as in "user:1000". Dots or dashes are often used for multi-word fields, as in "comment:1234:reply.to" or "comment:1234:reply-to".


2 Answers

Redis is not designed to structure the data in SQL way. Beside a in-memory key value store, it supports five types of data structures: Strings, Hashes, Lists, Sets and Sorted Sets. At high level this is a sufficient hint that Redis is designed to solve performance problems that arises due to high computation in relational data models. However, if you want to execute sql query in a in-memory structure, you may want to look at memsql.

like image 51
Asheesh Avatar answered Sep 23 '22 14:09

Asheesh


Let's break down the SQL statement into different components and I'll try to show how redis can accomplish various parts.

Select J.JobID, J.JobName from Job J;

We translate each row in "Job" into a hash in redis using the SQL primary index as the redis natural index in redis. For example: SQL

==JobId==|==Name==
123        Fred

Redis HSET Job:123 Name Fred which can be conceptualized as

Job-123 => {"Name":"Fred"}

Thus we can store columns as hash fields in redis

Let's say we do the same thing for JobInfo. Each JobInfo object has its own ID

JobInfo-876 => {"meta1": "some value", "meta2": "bla", "JobID": "123"}

In sql normally we would make a secondary index on JobInfo.JobID but in NoSql land we maintain our own secondary indexes.

Sorted Sets are great for this. Thus when we want to fetch JobInfo objects by some field, JobId in this case we can add it to a sorted set like this ZADD JobInfo-JobID 123 JobInfo-876

This results in a set with 1 element in it {JobInfo-876} which has a score of 123. I realize that forcing all JobIDs into the float range for the score is a bad idea, but work with me here.

Now when we want to find all JobInfo objects for a given JobID we just do a log(N) lookup into the index. ZRANGEBYSCORE JobInfo-JobID 123 123 which returns "JobInfo-876"

Now to implement simple joins we simply reuse this JobInfo-JobID index by storing Job keys by their JobIDs. ZADD JobInfo-JobID 123 Job-123

Thus when doing something akin to SELECT J.JobID, J.Name, B.meta1 FROM Job, JobInfo USING (JobID).

This would translate to scanning through the JobInfo-JobID secondary index and reorganizing the Job and JobInfo objects returned. ZRANGEBYSCORE JobInfo-JobID -inf +inf WITHSCORES 5 -> (Job-123, JobInfo-876)

These objects all share the same JobID. CLient side you'd then asynchronously fetch the needed fields. Or you could embed these lookups in a lua script. This lua script could make redis hang for a long time. Normally redis tries to be fair with clients and prefers you to have short batched queries instead of one long query.

Now we come to a big problem, what if we want to combine secondary indexes. Let's say we have a secondary index on JobInfo.Status, and another on Job.JobType. If we make a set of all jobs with the right JobType and use that as a filter on the JobInfo-JobID shared secondary index then we not only eliminate the bad Job elements but also every JobInfo element. We could, I guess fetch the scores(JobID) on the intersection and refetch all JobInfo objects with those scores, but we lose some of the filtering we did.

It is at this point where redis breaks down.

Here is an article on secondary indexes from the creator of redis himself: http://redis.io/topics/indexes He touches multi-dimensional indexes for filtering purposes. As you can see he designed the data structures in a very versatile way. One that is the most appealing is the fact that sorted set elements with the same score are stored in lexicographical order. Thus you can easily have all elements have a score of 0 and piggyback on Redis's speed and use it more like cockroachDB, which relies on a global order to implement many SQL features.

like image 37
Brian Avatar answered Sep 23 '22 14:09

Brian