Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Databases versus plain text

When dealing with small projects, what do you feel is the break even point for storing data in simple text files, hash tables, etc., versus using a real database? For small projects with simple data management requirements, a real database is unnecessary complexity and violates YAGNI. However, at some point the complexity of a database is obviously worth it. What are some signs that your problem is too complex for simple ad-hoc techniques and needs a real database?

Note: To people used to enterprise environments, this will probably sound like a weird question. However, my problem domain is bioinformatics. Most of my programming is prototypes, not production code. I'm primarily a domain expert and secondarily a programmer. Most of my code is algorithm-centric, not data management-centric. The purpose of this question is largely for me to figure out how much work I might save in the long run if I learn to use proper databases in my code instead of the more ad-hoc techniques I typically use.

like image 743
dsimcha Avatar asked Feb 05 '09 03:02

dsimcha


People also ask

Why is a database better than a text file?

The primary advantage of a database vs a text file or an single data file (aka a flat file) is that a database allows you to capture relationships across single data files.

What is plain text in database?

In cryptography, plaintext is usually ordinary readable text before it is encrypted into ciphertext, or readable text after it is decrypted. Data input to or output from encryption algorithms is not always plaintext.

Is a database just a text file?

Wikipedia tells us that a database is an organized collection of data. By that measure, your text file is a database.

Which is faster database or file system?

So for very simple operations, the filesystem is definitely faster. Filesystems will probably beat an RDBMS for raw read throughput too since there is less overhead. In fact, if you think about it, the database can never be faster than the filesystem it sits on in terms of raw throughput.


2 Answers

1) Concurrency. Do you have multiple people accessing the same dataset? Then it's going to get pretty involved to broker all of the different readers and writers in a scalable fashion if you roll your own system.

2) Formatting and relationships: Is your data something that doesn't fit neatly into a table structure? Long nucleotide sequences and stuff like that? That's not really conveniently tabular data.

Another example: Nobody would consider implementing software like Photoshop to store PSDs in a relational format, because the data structures don't really lend themselves to that type of storage or query pattern.

3) ACID (sort of a corollary to #1): If Atomicity, Consistency, Integrity, and Durability are not challenges with a flat file, then go with a flat file.

like image 155
Dave Markle Avatar answered Sep 30 '22 15:09

Dave Markle


For me, the line is crossed once I have to query my data in ways that involve more than a single relationship. Relating two flat data structures on disk is fairly simple, but once we get beyond that, a set-based language like SQL and formal database relationships actually reduce complexity.

like image 42
Rex M Avatar answered Sep 30 '22 15:09

Rex M