I'm working on an application where data size and SQL queries are going to be heavy. I am thinking between Cassandra or Amazon SimpleDB. Can you please suggest which is more suitable in this kind of scenario?
Cassandra data indexing seems better than Amazon simpleDB, but the queries have fewer options compared to Amazon SimpleDB. Seems Amazon SimpleDB has heavy I/O rates.
Few of the complex use cases are user activities with different filters that user can put to narrow down to some interesting activities.
If you think there is anyother cleaner and better solution apart from these two, please suggest.
SimpleDB is deprecated, more expensive than DDB, and kind of weird to use. Backing your keystore with a deprecated service just sounds like a road to many sleepless nights ;) The utility does depend on three external services: DynamoDB, KMS, and IAM (for permissioning).
Amazon SimpleDB provides simple index and query capabilities. Amazon RDS enables you to run a fully featured relational database while offloading database administration. And, using one of our many relational database AMIs on Amazon EC2 and Amazon EBS allows you to operate your own relational database in the cloud.
Amazon SimpleDB is a highly available NoSQL data store that offloads the work of database administration. Developers simply store and query data items via web services requests and Amazon SimpleDB does the rest.
Apache Cassandra is an open source distributed database that helps store and manage large data volume across multiple servers. DynamoDB is a fully managed distributed database provided by Amazon Web Services that can handle large amount of data and request traffic. Apache Cassandra is a column oriented data store.
SimpleDB can only scale by sharding, has 10 GB data size limit per table, and query performance is parallel to record count (eg: poor if you have 1 million records). And google's datastore is slower than simpledb. Cassandra is much more scalable, high traffic sites began to use it, there is nothing better for free if you need high write rates with massive data. cassandra survey
If your read/write ratio is something like %90 for read and %10 for write, then terracotta or infinispan with postgres is a better fit. There some free clustering options for postgresql but none of them matured (mostly prototypes).
Another option is sharding. Hiberntae and NHibernate has sharding support. You can use them with postgres or mysql but you loose joins.
Regards
Something quick to consider is that any custom nosql setup (Cassandra, mongodb, redis, etc) will be leagues faster at low volume than simple db, but you take on the burden of all the server management configuration, disaster recovery, backups, etc.
Then as your apps scale, you are going to be spending more and more time admining your increasingly complex data layer and making sure it stays backed up and safe.
Simple db (and equivalent data stores) remove that concern from you as you scale.
So keep that in mind when considering clopud storage or rolling your own setup, cause when disaster strikes it sssuuuccckkkss, especially if you don't know exactly what you are doing to recover it all.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With