I am looking for the database/mechanism to store the data where I can write the data and read the data with high performance.
This storage is used to for storing the Logging like important information across multiple systems. Since it's critical data which will be logged, read performance should be pretty fast as these data will be used to show history. Since we never do update on them/delete on them/or do any kinda joins, I am looking for right solution.
Probably we might archive the data in long time but that's something ok to deal with.
I tried looking at different sources to understand different NoSql databases, experts opinion is always better :)
Must Have:
1. Fast Read without fail
2. Fast Write without fail
3. Random access Performance
4. Replication kinda feature, one goes down, immediately another should be up and working
5. Concurrent write/read data
Good to Have:
1. Search content like analysing the data for auditing with/without Indexes
Don't required:
1. Transactions are not required at all
2. Update never happens
3. Delete never happens
4. Joins are not required
Referred: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
4) Best Databases for 2021: MongoDB MongoDB is suitable for hierarchical data storage and is almost 100 times faster than Relational Database Management System (RDBMS). This platform centers around the CAP theorem (Consistency, Availability, and Partition tolerance.)
If we have a vast variety of attributes in our data and a vast variety of queries, we use a Document DB like MongoDB or Couchbase. But if we have to work on an extremely large scale but have few types of queries we need to run, then we go for a Columnar DB like Cassandra or HBase.
NoSQL databases store data in documents rather than relational tables. Accordingly, we classify them as “not only SQL” and subdivide them by a variety of flexible data models. Types of NoSQL databases include pure document databases, key-value stores, wide-column databases, and graph databases.
Disclosure: Kevin Porter is a Senior Software Engineer at Aerospike, Inc. since May 2013. (ref)
Be sure to consider Aerospike; Aerospike dominates in the adtech space where high throughput reads and writes are a required. Aerospike is frequently touted as having "the speed of Redis with the scalability of Cassandra." For searching/querying see Aerospike's secondary index documentation.
For more information see the discussion/articles below:
Lastly verify the performance for yourself with the One million TPS on EC2 Instructions.
Let me be the Cassandra sponsor.
Disclaimer: I don't say Cassandra is better than the others because I don't even know so deeply mongo/redis/whatever and I don't want even come into this kind of stuffs.
The reason why I suggest Cassandra is because your needs match perfectly with what Cassandra offers and your "don't required list" is a set of feature that are either not supported in Cassandra (joins for instances) or considered an anti-pattern (deletes and in some situations updates).
From your "Must Have" list, point by point
Fast Read without fail: Supported. You can choose the consistency level of each read operation deciding how much important is to retrieve the most fresh information and how much important is speed
Fast Write without fail: Same as point 1
Random access Performance: When coming in the Cassandra world you have to consider many parameters to get a random access performance but the most important that comes into my mind is the data model -- if you create a data model that scales horizontally (give a look here) and you avoid hotspots you get what you need. If you model your DB in a good way you should have O(1) for each operation since data are structured to be queried
Replication: In this Cassandra is even better than what you might think. If one node goes down nothing changes to the cluster and everything(*) keep working perfectly. Cassandra spots no single point of failure. I can tell you with older Cassandra version I've had an uptime of more than 3 years
Concurrent write/read data: Cassandra uses the lww policy (last-write-wins) to handle concurrent writes on the same key. The system supports multiple read-write and with newer protocols also async operations.
There are lots of other interesting features Cassandra offers: linear horizontal scaling is the one I appreciate more but there is also the fact that you can know the instant in which every piece of data has been updated (the timestamp of lww), counters features and so on.
(*) - if you don't use Consistency Level All which, imho, should NEVER be used in such a system.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With