Which clustered NoSQL DB for a Message Storing purpose?

Tags:

Yet another question about which NoSQL to choose. However, I haven't found yet someone asking for this type of purpose, message storing...

I have an Erlang Chat Server made, I'm already using MySQL for storing friend list, and "JOIN needed" informations.

I would like to store Messages (That user has not receive because he was offline...) and retrieve them.

I have made a pre-selection of NoSQL, I can't use things like MongoDB due to it's RAM oriented paradigm, and fail to cluster like others. I have down my list to 3 choices I guess :

Hbase
Riak
Cassandra

I know that their model are quit different, one using key/value, the other using SuperColumns and co.

Until now I had a preference for Riak due to it's stable client library for Erlang.

I know that I can use Cassandra with Thrift, but it seems not very stable with Erlang (I haven't got good returns about it)

I don't really know anything about HBase right now, just know it exist and is based on Dynamo like Cassandra and Riak.

So Here's what I need to do :

Store from 1 to X messages per registered user.
Get the number of stored messages per user.
retrieve all messages from an user at once.
delete all messages from an user at once.
delete all messages that are older than X months

Right now, I'm really new to those NoSQL DB, I always been a MySQL aficionados, This is why I ask you this question, as a Newbie, would someone who has more experience than I could Help me to choose which one is better, and would let me do everything I want to without to much hassle...

Thanks !

535

asked Apr 23 '12 18:04

TheSquad

1 Answers

I can't speak for Cassandra or Hbase, but let me address the Riak part.

Yes, Riak would be appropriate for your scenario (and I've seen several companies and social networks use it for a similar purpose).

To implement this, you would need the plain Riak Key/Value operations, plus some sort of indexing engine. Your options are (in rough order of preference):

CRDT Sets. If your 1-N collection size is reasonably sized (let's say, there's less than 50 messages per user or whatever), you can store the keys of the child collection in a CRDT Set Data Type.
Riak Search. If your collection size is large, and especially if you need to search your objects on arbitrary fields, you can use Riak Search. It spins up Apache Solr in the background, and indexes your objects according to a schema you define. It has pretty awesome searching, aggregation and statistics, geospatial capabilities, etc.
Secondary Indexes. You can run Riak on top of an eLevelDB storage back end, and enable Secondary Index (2i) functionality.

Run a few performance tests, to pick the fastest approach.

As far as schema, I would recommend using two buckets (for the setup you describe): a User bucket, and a Message bucket.

Index the message bucket. (Either by associating a Search index with it, or by storing a user_key via 2i). This lets you do all of the required operations (and the message log does not have to fit into memory):

Store from 1 to X messages per registered user - Once you create a User object and get a user key, storing an arbitrary amount of messages per user is easy, they would be straight up writes to the Message bucket, each message storing the appropriate user_key as a secondary index.
Get the number of stored messages per user - No problem. Get the list of message keys belonging to a user (via a search query, by retrieving the Set object where you're keeping the keys, or via a 2i query on user_key). This lets you get the count on the client side.
retrieve all messages from a user at once - See previous item. Get the list of keys of all messages belonging to the user (via Search, Sets or 2i), and then fetch the actual messages for those keys by multi-fetching the values for each key (all the official Riak clients have a multiFetch capability, client-side).
delete all messages from a user at once - Very similar. Get list of message keys for the user, issue Deletes to them on the client side.
delete all messages that are older than X months - You can add an index on Date. Then, retrieve all message keys older than X months (via Search or 2i), and issue client-side Deletes for them.

answered Dec 01 '22 18:12

Dmitri Zagidulin

Related questions
                            
                                How do I get dialyzer to ignore certain unexported functions?
                            
                                Lager loglevels per application
                            
                                Should the neurons in a neural network be asynchronous?
                            
                                How to connect to PostgreSQL from Phoenix Web App via SSL?
                            
                                Ecto's fragment allowing SQL injection
                            
                                0MQ with green threads?
                            
                                erlang embedded into C
                            
                                Erlang compiler optimizations
                            
                                Erlang: automatic population of .hosts.erlang file?
                            
                                Log errors in a file
                            
                                Exception error in Erlang
                            
                                Erlang. Asynchronous http request. How to know when connection is broken?
                            
                                Handling external libs with rebar
                            
                                How do I get the output of Erlang's preprocessor?
                            
                                Looking for persistent, distributed, worker queue for erlang [closed]
                            
                                Must UTF-8 binaries include /utf8 in the binary literal in Erlang?
                            
                                How to uninstall Erlang using Homebrew in OSX 10.10?
                            
                                Sending binary data over websocket with cowboy and MessagePack
                            
                                Installing Erlang / RabbitMQ on Windows 10 64-bit
                            
                                mochijson2 or mochijson

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Which clustered NoSQL DB for a Message Storing purpose?

Tags:

nosql

cassandra

erlang

hbase

riak

TheSquad

People also ask

1 Answers

Dmitri Zagidulin

Recent Activity

Donate For Us