When dealing with small projects, what do you feel is the break even point for storing data in simple text files, hash tables, etc., versus using a real database? For small projects with simple data management requirements, a real database is unnecessary complexity and violates YAGNI. However, at some point the complexity of a database is obviously worth it. What are some signs that your problem is too complex for simple ad-hoc techniques and needs a real database?
Note: To people used to enterprise environments, this will probably sound like a weird question. However, my problem domain is bioinformatics. Most of my programming is prototypes, not production code. I'm primarily a domain expert and secondarily a programmer. Most of my code is algorithm-centric, not data management-centric. The purpose of this question is largely for me to figure out how much work I might save in the long run if I learn to use proper databases in my code instead of the more ad-hoc techniques I typically use.
The primary advantage of a database vs a text file or an single data file (aka a flat file) is that a database allows you to capture relationships across single data files.
In cryptography, plaintext is usually ordinary readable text before it is encrypted into ciphertext, or readable text after it is decrypted. Data input to or output from encryption algorithms is not always plaintext.
Wikipedia tells us that a database is an organized collection of data. By that measure, your text file is a database.
So for very simple operations, the filesystem is definitely faster. Filesystems will probably beat an RDBMS for raw read throughput too since there is less overhead. In fact, if you think about it, the database can never be faster than the filesystem it sits on in terms of raw throughput.
1) Concurrency. Do you have multiple people accessing the same dataset? Then it's going to get pretty involved to broker all of the different readers and writers in a scalable fashion if you roll your own system.
2) Formatting and relationships: Is your data something that doesn't fit neatly into a table structure? Long nucleotide sequences and stuff like that? That's not really conveniently tabular data.
Another example: Nobody would consider implementing software like Photoshop to store PSDs in a relational format, because the data structures don't really lend themselves to that type of storage or query pattern.
3) ACID (sort of a corollary to #1): If Atomicity, Consistency, Integrity, and Durability are not challenges with a flat file, then go with a flat file.
For me, the line is crossed once I have to query my data in ways that involve more than a single relationship. Relating two flat data structures on disk is fairly simple, but once we get beyond that, a set-based language like SQL and formal database relationships actually reduce complexity.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With