Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Storing large amounts of data: DB or File System?

Let's say my application creates, stores and retrieves a very large amount of entries (tens of millions). Each entry has variable number of different data (for example, some entries have only a few bytes such as ID/title, while some may have megabytes of supplementary data). Basic structure of each entry is same and is in XML format.

Entries are created and edited (most likely by appending, not rewriting) arbitrarily.

Does it make sense to store entries as separate files in a file system while keeping necessary sets of indexes in the DB vs. saving everything in a DB?

like image 741
mvbl fst Avatar asked Jan 16 '10 22:01

mvbl fst


People also ask

Is it better to store files in database or filesystem?

Database provides a proper data recovery process while file system did not. In terms of security the database is more secure then the file system (usually).

Which files are used to store large amounts of data?

Hard disk is an hardware component which is used to store large amounts of data.

Which database is best for large data?

MongoDB is also considered to be the best database for large amounts of text and the best database for large data.


1 Answers

It really depends on how you're going to use it. Databases can handle more entries in a table than most people think, especially with proper indexing. On the other hand, if you aren't going to be making use of the functionality that a relational database provides, there might not be much reason to use it.

Ok, enough generalizing. Given that a database eventually boils down to "files on disk" anyway, I wouldn't worry too much about what "the right thing to do" is. If the primary purpose of the database is just to efficiently retrieve these files, I think it would be perfectly fine to keep the DB entries small and look up file paths instead of actual data - especially since your file system should be pretty efficient at retrieving data given a specific location.

In case you're interested, this is actually a common data storage pattern for search engines - the index will store the indexed data and a pointer to the stored data on disk, rather than storing everything in the index.

like image 101
danben Avatar answered Sep 29 '22 05:09

danben