Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multi-parameter match-finder with Redis

Tags:

redis

I need to create an match-finder system for some data set, as follows:

There is a set of objects, each identified by a string ObjectID.

Each object has exactly N properties Pi. Each property value is a string.

Database example for N = 3 (in real life N = 8).

ObjectID: P1     P2    P3
--------------------------------
APPLE:    RED    ROUND FRUIT
ORANGE:   ORANGE ROUND FRUIT
CARROT:   RED    LONG  VEGETABLE

The system must return sets of ObjectIDs, matching given query on object properties. In query user must specify all property values. Alternatively, for some or all properties in query user may specify a "wildcard" *, meaning that any property value would match criteria.

Example queries:

P1  P2    P3        => Result
------------------------------------
*   ROUND FRUIT     => APPLE, ORANGE
RED LONG  VEGETABLE => CARROT
RED *     *         => CARROT, APPLE

All of this is trivially done with SQL.

The question is: is there a neat way to do that with Redis?

Note that I'm interested in Redis-based solutions specifically, for self-education purposes; other DBs are offtopic for this particular question.

Update: Trivial solution with explicit ObjectID lists for each Pi and application-side filtering does not look neat enough to me :-)

like image 533
Alexander Gladysh Avatar asked Oct 29 '11 14:10

Alexander Gladysh


1 Answers

What you are trying to do here is an inverted index.

For each column, have it map to a "set". Then, you can intersect the sets to get the result.

So, APPLE: RED ROUND FRUIT would map to the following inserts:

SADD p1:RED APPLE
SADD p2:ROUND APPLE
SADD p3:FRUIT APPLE

Then, let's say I want to query for * ROUND FRUIT, I would do:

SINTER p2:ROUND p3:FRUIT

This command is taking the intersection of the items in the p2:ROUND set and the p3:FRUIT set. This will return all the items that are ROUND and FRUIT, not caring what p1 is.

Some other examples:

SMEMBERS p1:GREEN
SINTER p1:RED p2:ROUND p3:FRUIT
SUNION p1:RED p1:GREEN

My above answer is going to use some computation power because the intersection operation is O(N*M). Here is a way of doing it that is more memory intensive, but will have faster retrieval because it effectively precomputes the indexes.

For each combination of properties, make a key that stores a set:

So, APPLE: RED ROUND FRUIT would map to the following inserts:

SADD RED:ROUND:FRUIT APPLE
SADD :ROUND:FRUIT APPLE
SADD RED::FRUIT APPLE
SADD RED:ROUND: APPLE
SADD RED:: APPLE
SADD :ROUND: APPLE
SADD ::FRUIT APPLE
SADD ::: APPLE

Then, to query, you simply access the respective key. For example, * ROUND FRUIT would simply be

SMEMBERS :ROUND:FRUIT

Obviously, this doesn't scale well at all in terms of memory when you have many dimensions, but it will be extremely snappy to retrieve results.

like image 177
Donald Miner Avatar answered Nov 14 '22 01:11

Donald Miner