I am in the process of learning Redis and am building a geo program for learning purposes. I would like to only use Redis to store the data and am trying to avoid any relational databases. My question is how to best design the database for the program. This is what the how the program goes:
1) I will create millions of random robots around the world which wander so they can have different geo coordinates (some robots can be in the exact same space).
2) Each robot will randomly send a post to the server (every few hours possibly on average) which will contain: a) the location of where the robot sent this data from (in either coordinates or geohash depending on the best implementation idea) b) some small text
3) I will have a map with all the robots and would like to be able to click on a robot and get this information: a) all the posts which were posted nearby the robot I just clicked
4) Due to the fact I will be hosting this on AWS I will need to delete the posts every couple of hours to keep the memory usage low so some type of expiration is mandatory.
My main concern is performance and I am interested in how to design the Redis database.
In a single day (I will work out the math for random posts to do this) about ~500,000,000 posts will be generated.
My Incomplete ideas so far:
Idea 1
1) a post would be stored as such:
`HSET [Geohash of location] [timestamp] [small text] (<-- the value will be used in a later feature to increment the number of manual modification I make to a post).
2) I then would be able to get all the posts near a robot by sending the geohash location he is in. The downfall here is I would also need to include his 8 geohash neighbors which would require 8 more queries. Which is why I am also looking into concept of spatial proximity for this feature.
HGETALL [GeoHash Location of robot]
This would then return the field ([timestamp]) and value ("0");
3) Expiration of old posts. Since I can't use the EXPIRE command to delete fields from a hashset, I would then need to scan through all the hashset fields periodically and find old timestamps and remove them. Since Redis only allows pattern searching this could will be difficult when all the timestamps are different.
Idea 2:
Use Redis-geo (https://matt.sh/redis-geo).
1) To store the posts I would run:
geoadd globalSet [posts_long] [posts_lat] "small text";
2) To get all the post information for a robot nearby:
georadius globalSet [robots_long] [robots_lat] [X] km
This would return all posts near the robot within X kms.
3) Then I am now stuck how to remove old posts
Ok, lets separate our tasks:
1) Lets make ZSET which contain robot ID and his SCORE will be last-activity-timestamp, in future we will be able to delete non-active robots using this index.
ZADD ZSET:ROBOTS <timestamp> robot:17
or event better just 17
without robot:
because of redis will store integers as 4 bytes in the RAM.
2) Lets store our robot generic info in HSET
HSET HSET:ROBOT:17 name "Best robot ever #17" model "Terminator T-800"
3) Generally we can use several ways to store it, for example we can take regular ZSET using multi dimensional indexes technique (Multi dimensional indexes), but it very complicated to understand, so lets use simpler redis GEO
GEOADD GEO:ROBOT:17 13.361389 38.115556 "<timestamp>:<message-data>"
Internally GEO use regular ZSET, so we can easily iterate over it by ZRANGE or ZRANGEBYSCORE commands
.
And of course we can use GEO commands like GEORADIUS for our needs.
4) The cleanup process. I suggest to clean-up by time, but you can make it in same way by number of entries, just use ZRANGE
instead ZRANGEBYSCORE
Lets find all of our non active robots that was non active at least a week.
ZRANGEBYSCORE ZSET:ROBOTS -inf <timestamp-of-week-before>
Now we need to iterate over those ID's and remove un-needed HSET, GEO keys and remove it from our index
ZREM ZSET:ROBOTS 17
DEL HSET:ROBOT:17
DEL GEO:ROBOT:17
Now we need to remove only old GEO-history entries, as I said above GEO in redis is a regular ZSET under the hood, so lets use ZRANGE
ZRANGE GEO:ROBOT:17 0 -1
We will get list of entries, but it will be sorted strange because of GEO, each score
will be GEO location
.
Our entries formatted as ":" so we can use split(':')
and compare timestamp, if it to old we remove it. For example our timestamp is 12345678
and message is hello
ZDEL GEO:ROBOT:17 1234567:hello
P.S. I highly recommend you to read this awesome article about ZSET's in redis
In short: Redis sorting items not only by score but by key name too, this means that entries with same score will be sorted alphabetical, which is very useful!
ZADD key 0 ccc 0 bbb 0 aaa
ZRANGE key 0 -1
will return you sorted set:
1. "aaa"
2. "bbb"
3. "ccc"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With