Let's say my application creates, stores and retrieves a very large amount of entries (tens of millions). Each entry has variable number of different data (for example, some entries have only a few bytes such as ID/title, while some may have megabytes of supplementary data). Basic structure of each entry is same and is in XML format.
Entries are created and edited (most likely by appending, not rewriting) arbitrarily.
Does it make sense to store entries as separate files in a file system while keeping necessary sets of indexes in the DB vs. saving everything in a DB?
Database provides a proper data recovery process while file system did not. In terms of security the database is more secure then the file system (usually).
Hard disk is an hardware component which is used to store large amounts of data.
MongoDB is also considered to be the best database for large amounts of text and the best database for large data.
It really depends on how you're going to use it. Databases can handle more entries in a table than most people think, especially with proper indexing. On the other hand, if you aren't going to be making use of the functionality that a relational database provides, there might not be much reason to use it.
Ok, enough generalizing. Given that a database eventually boils down to "files on disk" anyway, I wouldn't worry too much about what "the right thing to do" is. If the primary purpose of the database is just to efficiently retrieve these files, I think it would be perfectly fine to keep the DB entries small and look up file paths instead of actual data - especially since your file system should be pretty efficient at retrieving data given a specific location.
In case you're interested, this is actually a common data storage pattern for search engines - the index will store the indexed data and a pointer to the stored data on disk, rather than storing everything in the index.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With