Storing large amounts of data: DB or File System?

Tags:

Let's say my application creates, stores and retrieves a very large amount of entries (tens of millions). Each entry has variable number of different data (for example, some entries have only a few bytes such as ID/title, while some may have megabytes of supplementary data). Basic structure of each entry is same and is in XML format.

Entries are created and edited (most likely by appending, not rewriting) arbitrarily.

Does it make sense to store entries as separate files in a file system while keeping necessary sets of indexes in the DB vs. saving everything in a DB?

741

asked Jan 16 '10 22:01

mvbl fst

1 Answers

It really depends on how you're going to use it. Databases can handle more entries in a table than most people think, especially with proper indexing. On the other hand, if you aren't going to be making use of the functionality that a relational database provides, there might not be much reason to use it.

Ok, enough generalizing. Given that a database eventually boils down to "files on disk" anyway, I wouldn't worry too much about what "the right thing to do" is. If the primary purpose of the database is just to efficiently retrieve these files, I think it would be perfectly fine to keep the DB entries small and look up file paths instead of actual data - especially since your file system should be pretty efficient at retrieving data given a specific location.

In case you're interested, this is actually a common data storage pattern for search engines - the index will store the indexed data and a pointer to the stored data on disk, rather than storing everything in the index.

101

answered Sep 29 '22 05:09

danben

Related questions
                            
                                Correct way to manage redis connections in django
                            
                                SQL into outfile - where is the file stored? (MySQL, Windows)
                            
                                Inserting Multiple Records into SQL Server database using for loop
                            
                                Multiprocessing Queue.get() hangs
                            
                                java.lang.ClassNotFoundException: com.mysql.jdbc.Driver error even after importing library
                            
                                Redis: the best way to get all hash values
                            
                                get last update of database return null (mysql)
                            
                                performing distinct on multiple fields in mongodb
                            
                                How do I check constraints on table columns using Rails?
                            
                                Cast timestamp to integer in Redshift
                            
                                Switching PostgreSQL database or Schema in DataGrip JetBrains
                            
                                How to find and tail the Oracle alert log
                            
                                Query vs. View
                            
                                How to reset stop words in MYSQL?
                            
                                Need advice for large .net data access layer
                            
                                How are regular and composite indexes implemented in RDBs?
                            
                                Persisting graph data (Java)
                            
                                Erlang : Mnesia : Updating a single field value in a row
                            
                                What does "Migrating a Django application" mean?
                            
                                Caching user data to avoid excess database trips

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Storing large amounts of data: DB or File System?

Tags:

database

indexing

filesystems

data-structures

database-design

mvbl fst

People also ask

1 Answers

danben

Recent Activity

Donate For Us