Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort/order data?

I've already experiences with MongoDB, CouchDB, Redis, Tokyo Cabinet, and other NoSQL Databases. Recently I stumbled upon Riak and it looks very interesting to me. To getting started with it, I decided to write a small Twitter clone, the "hello world" in the NoSQL World. To get a fully working clone, it's necessary to order the tweets chronologically. After reading the Riak docs I discovered that Map-Reduce is the right tool for this job. In my development-environment it works quite well, but how's the performance in production, with hundreds of parallel queries? Are there other, maybe faster, methods for sorting data, or is it possible to store data in an ordered form (like Cassandra)?

I think I've found another solution to this problem - a simple linked list. So one possible implementation could be, that every user gets his/her own "timeline bucket", where links to the tweets-data itself gets stored (tweets gets stored separately in the "tweets" bucket). As you would know, this timeline-bucket must contain a key named "first", which links to the latest timeline-object and is the starting point of the list. To insert a new tweet in the timeline, just insert a new item in the timeline bucket, set the "next"-link of this new item to the "first"-item, after that, make the new item to "first".

In short: Insert an item as you would do in a linked list...

As with Twitter, the personal timeline just holds 20 tweets shown to the user. To receive the last 20 tweets, there are only 2 queries necessary. To speed things up, the first query uses the link-walking ability of Riak to get the latest 20 objects, tagged by "next". Finally, the second, and last query uses the keys computed by the first query to receive the tweets itself (using map/reduce).

To remove the tweets of users you've just unfollowed, I would use the secondary index ability of Riak 1.0 to receive the related timeline-objects/tweets.

like image 646
Railsmechanic Avatar asked Sep 29 '11 18:09

Railsmechanic


1 Answers

It is not possible to store data in an ordered form in Riak without resorting to re-writing portions of the Riak core. Data is stored, roughly, in bucket + key order. The actual order depends on the backend storage mechanism that you're using for Riak.

Riak 1.0 has some features that might help you, too. There's support for secondary indexes as well as improvements to Map Reduce operations - in particular, they perform much better in highly concurrent scenarios.

Alexander Siculars wrote an article about Pagination with Riak. It outlines the problem pretty well. Yammer also make extensive use of Riak and two of their engineers put together a presentation about Riak at Yammer. It doesn't go into a lot of implementation details, but you can learn a lot about how they have designed their solution.

Combining secondary index queries and Map Reduce makes it possible to solve your problem very easily.

like image 187
Jeremiah Peschka Avatar answered Jan 04 '23 13:01

Jeremiah Peschka