Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ensuring Uniqueness for a sorted set in redis

Tags:

redis

I am trying to store media objects and have them retrievable by a certain time range through redis. I have chosen a sorted set data type to do this. I am adding elements like:

zAdd: key: media:552672 score: 1355264694
zAdd: key: media:552672 score: 1355248565
zAdd: key: media:552672 score: 1355209157
zAdd: key: media:552672 score: 1355208992
zAdd: key: media:552672 score: 1355208888
zAdd: key: media:552672 score: 1355208815

Where key is unique to the location id the media was taken at and the score is the creation time of the media object. And the value is a json_decode of the media object.

When I go to retrieve using zRevRangeByScore, occasionally there will be duplicate entries. I'm essentially using Redis as a buffer to an external API, if the users are making the same API call twice with X seconds, then I will retrieve the results from the cache, otherwise, I will add it to the cache, not checking to see if it already exists due to the definition of a set not containing duplicates. Possible known issues: If the media object attribute changes between caching it will show up as a duplicate

Is there a better way to store this type of data without doing checks on the redis client side?

TLDR; What is the best way to store and retrieve objects in Redis where you can select a range of objects by timestamp and ensure that they are unique?

like image 851
nmock Avatar asked Dec 12 '12 02:12

nmock


1 Answers

Lets make sure we're talking about the same things, so here is the terminology for Redis sorted sets:

ZADD key score member [score] [member]
summary: Add one or more members to a sorted set, or update its score if it already exists
  • key - the 'name' of the sorted set
  • score - the score (in our case a timestamp)
  • member - the string the score is associated with
  • A sorted set has many members, each with a score

It sounds like your are using a JSON encoded string of the object as the member. The member is what is unique in a sorted set. As you say, if the object changes it will be added as a new member to the sorted set. That is probably not what you want.

A sorted set is the Redis way to store data by timestamp, but the member that is stored in the set is usually a 'pointer' to another key in Redis.

From your description I think you want this data structure:

  • A sorted set storing all media by created timestamp
  • A string or hash for each unique media

I recommend storing the media objects in a hash as this allows more flexibility. Example:

# add some members to our sorted set
redis 127.0.0.1:6379> ZADD media 1000 media:1 1003 media:2 1001 media:3
(integer) 3
# create hashes for our members
redis 127.0.0.1:6379> HMSET media:1 id 1 name "media one" content "content string for one"
OK
redis 127.0.0.1:6379> HMSET media:2 id 2 name "media two" content "content string for two"
OK
redis 127.0.0.1:6379> HMSET media:3 id 3 name "media three" content "content string for three"
OK

There are two ways to retrieve data stored in this way. If you need to get members within specific timestamp ranges (eg: last 7 days) you will have to use ZREVRANGEBYSCORE to retrieve the members, then loop through those to fetch each hash with HGETALL or similar. See pipelining to see how you can do the loop with one call to the server.

redis 127.0.0.1:6379> ZREVRANGEBYSCORE media +inf -inf
1) "media:2"
2) "media:3"
3) "media:1"
redis 127.0.0.1:6379> HGETALL media:2
1) "id"
2) "2"
3) "name"
4) "media two"
5) "content"
6) "content string for two"

If you only want to get the last n members (or eg: 10th most recent to 100th most recent) you can use SORT to get items. See the sort documentation for syntax and how to retrieve different hash fields, limit the results and other options.

redis 127.0.0.1:6379> SORT media BY nosort GET # GET *->name GET *->content1) DESC
1) "media:2"
2) "media two"
3) "content string for two"
4) "media:3"
5) "media three"
6) "content string for three"
7) "media:1"
8) "media one"
9) "content string for one"

NB: sorting a sorted hash by score (BY nosort) only works from Redis 2.6.

If you plan on getting media for the last day, week, month, etc. I would recommend using a seperate sorted set for each one and use ZREMRANGEBYSCORE to remove old members. You can then just use SORT on these sorted sets to retrieve the data.

like image 158
myanimal Avatar answered Oct 21 '22 08:10

myanimal