Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extremely high QPS - DynamoDB vs MongoDB vs other noSQL?

Tags:

mongodb

nosql

We are building a system that will need to serve loads of small requests from day one. By "loads" I mean ~5,000 queries per second. For each query we need to retrieve ~20 records from noSQL database. There will be two batch reads - 3-4 records at first and then 16-17 reads instantly after that (based on the result of first read). That would be ~100,000 objects to read per second.

Until now we were thinking about using DynamoDB for this as it's really easy to start with.

Storage is not something I would be worried about as the objects will be really tiny. What I am worried about is cost of reads. DynamoDB costs $0.0113 per hour per 100 eventually consistent (which is fine for us) reads per second. That is $11,3 per hour for us provided that all objects are up to 1KB in size. And that would be $5424 per month based on 16 hours/day average usage.

So... $5424 per month.

I would consider other options but I am worried about maintenance issues, costs etc. I have never worked with such setups before so your advice would be really valuable.

What would be the most cost-effective (yet still hassle-free) solution for such read/write intensive application?

like image 481
sPaul Avatar asked Aug 26 '12 19:08

sPaul


People also ask

Which is faster DynamoDB or MongoDB?

DynamoDB is lightning fast (faster than MongoDB) so DynamoDB is often used as an alternative to sessions in scalable applications. DynamoDB best practices also suggests that if there are plenty of data which are less being used, move it to other table.

Why MongoDB is known as the best NoSQL database high performance?

MongoDB is better than other SQL databases because it allows a highly flexible and scalable document structure. For example: One data document in MongoDB can have five columns and the other one in the same collection can have ten columns.

Is DynamoDB more expensive than MongoDB?

DynamoDB starts with lower cost. Easy setup and maintenance on security and tables. MongoDB provides more features such as data validation, indexing strategy, query operation, and more data types. But it also costs more to start.


1 Answers

From your description above, I'm assuming that your 5,000 queries per second are entirely read operations. This is essentially what we'd call a data warehouse use case. What are your availability requirements? Does it have to be hosted on AWS and friends, or can you buy your own hardware to run in-house? What does your data look like? What does the logic which consumes this data look like?

You might get the sense that there really isn't enough information here to answer the question definitively, but I can at least offer some advice.

First, if your data is relatively small and your queries are simple, save yourself some hassle and make sure you're querying from RAM instead of disk. Any modern RDBMS with support for in-memory caching/tablespaces will do the trick. Postgres and MySQL both have features for this. In the case of Postgres make sure you've tuned the memory parameters appropriately as the out-of-the-box configuration is designed to run on pretty meager hardware. If you must use a NoSQL option, depending on the structure of your data Redis is probably a good choice (it's also primarily in-memory). However in order to say which flavor of NoSQL might be the best fit we'd need to know more about the structure of the data that you're querying, and what queries you're running.

If the queries boil down to SELECT * FROM table WHERE primary_key = {CONSTANT} - don't bother messing with NoSQL - just use an RDBMS and learn how to tune the dang thing. This is doubly true if you can run it on your own hardware. If the connection count is high, use read slaves to balance the load.

Long-after-the-fact Edit (5/7/2013): Something I should've mentioned before: EC2 is a really really crappy place to measure performance of self-managed database nodes. Unless you're paying out the nose, your I/O perf will be terrible. Your choices are to either pay big money for provisioned IOPS, RAID together a bunch of EBS volumes, or rely on ephemeral storage whilst syncing a WAL off to S3 or similar. All of these options are expensive and difficult to maintain. All of these options have varying degrees of performance.

I discovered this for a recent project, so I switched to Rackspace. The performance increased tremendously there, but I noticed that I was paying a lot for CPU and RAM resources when really I just need fast I/O. Now I host with Digital Ocean. All of DO's storage is SSD. Their CPU performance is kind of crappy in comparison to other offerings, but I'm incredibly I/O bound so I Just Don't Care. After dropping Postgres' random_page_cost to 2, I'm humming along quite nicely.

Moral of the story: profile, tune, repeat. Ask yourself what-if questions and constantly validate your assumptions.

Another long-after-the-fact-edit (11/23/2013): As an example of what I'm describing here, check out the following article for an example of using MySQL 5.7 with the InnoDB memcached plugin to achieve 1M QPS: http://dimitrik.free.fr/blog/archives/11-01-2013_11-30-2013.html#2013-11-22

like image 162
Ben Burns Avatar answered Sep 29 '22 01:09

Ben Burns