Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mongoDB vs relational databases when data can't fit into memory?

First of all, I apologize for my potentially shallow understanding of NoSQL architecture (and databases in general) so try to bear with me.

I'm thinking of using mongoDB to store resources associated with an UUID. The resources can be things such as large image files (tens of megabytes) so it makes sense to store them as files and store just links in my database along with the associated metadata. There's also the added flexibility of decoupling the actual location of the resource files, so I can use a different third party to store the files if I need to.

Now, one document which describes resources would be about 1kB. At first I except a couple hundred thousands of resource documents which would equal some hundreds of megabytes in database size, easily fitting into server memory. But in the future I might have to scale this into the order of tens of MILLIONS of documents. This would be tens of gigabytes which I can't squeeze into server memory anymore.

Only the index could still fit in memory being around a gigabyte or two. But if I understand correctly, I'd have to read from disk every time I did a lookup on an UUID. Is there a substantial speed benefit from mongoDB over a traditional relational database in such a situation?

BONUS QUESTION: is there an existing, established way of doing what I'm trying to achieve? :)

like image 737
Eero Heikkinen Avatar asked Oct 10 '22 16:10

Eero Heikkinen


1 Answers

MongoDB doesn't suddenly become slow the second the entire database no longer fits into physical memory. MongoDB currently uses a storage engine based on memory mapped files. This means data that is accessed often will usually be in memory (OS managed, but assume a LRU scheme or something similar).

As such it may not slow down at all at that point or only slightly, it really depends on your data access patterns. Similar story with indexes, if you (right) balance your index appropriately and if your use case allows it you can have a huge index with only a fraction of it in physical memory and still have very decent performance with the majority of index hits happening in physical memory.

Because you're talking about UUID's this might all be a bit hard to achieve since there's no guarantee that the same limited group of users are generating the vast majority of throughput. In those cases sharding really is the most appropriate way to maintain quality of service.

like image 73
Remon van Vliet Avatar answered Oct 14 '22 01:10

Remon van Vliet