The company I work for is trying to switch a product that uses flat file format to a database format. We're handling pretty big files of data (ie: 25GB/file) and they get updated really quick. We need to run queries that randomly access the data, as well as in a contiguous way. I am trying to convince them of the advantages of using a database, but some of my colleagues seem reluctant to this. So I was wondering if you guys can help me out here with some reasons or links to posts of why we should use databases, or at least clarify why flat files are better (if they are).
Difference between database and flat files are given below: Database provide more flexibility whereas flat file provide less flexibility. Database system provide data consistency whereas flat file can not provide data consistency. Database is more secure over flat files.
A flat file is a collection of data stored in a two-dimensional database in which similar yet discrete strings of information are stored as records in a table. The columns of the table represent one dimension of the database, while each row is a separate record.
1- Where the flat file can be usefull: Flat file can be faster than a database, but in very specific applications. They are faster if the data is read from start to finish without any search or write. If the data dont fit in memory and need to be read fully to get the job done, It 'can' be faster than a database.
They're faster; unless you're loading the entire flat file into memory, a database will allow faster access in almost all cases. They're safer; databases are easier to safely backup; they have mechanisms to check for file corruption, which flat files do not.
This is an answer I've already given some time ago:
It depends entirely on the domain-specific application needs. A lot of times direct text file/binary files access can be extremely fast, efficient, as well as providing you all the file access capabilities of your OS's file system.
Furthermore, your programming language most likely already has a built-in module (or is easy to make one) for specific parsing.
If what you need is many appends (INSERTS?) and sequential/few access little/no concurrency, files are the way to go.
On the other hand, when your requirements for concurrency, non-sequential reading/writing, atomicity, atomic permissions, your data is relational by the nature etc., you will be better off with a relational or OO database.
There is a lot that can be accomplished with SQLite3, which is extremely light (under 300kb), ACID compliant, written in C/C++, and highly ubiquitous (if it isn't already included in your programming language -for example Python-, there is surely one available). It can be useful even on db files as big as 140 terabytes, or 128 tebibytes (Link to Database Size), possible more.
If your requirements where bigger, there wouldn't even be a discussion, go for a full-blown RDBMS.
As you say in a comment that "the system" is merely a bunch of scripts, then you should take a look at pgbash.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With