Using a Filesystem (Not a Database!) for Schemaless Data - Best Practices

Tags:

After reading over my other question, Using a Relational Database for Schema-Less Data, I began to wonder if a filesystem is more appropriate than a relational database for storing and querying schemaless data.

Rather than just building a file system on top of MySQL, why not just save the data directly to the filesystem? Indexing needs to be figured out, but modern filesystems are very stable, have great features like replication, snapshot and backup facilities, and are flexible at storing schema-less data.

However, I can't find any examples of someone using a filesystem instead of a database.

Where can I find more resources on how to implement a schemaless (or "document-oriented") database as a layer on top of a filesystem? Is anyone using a modern filesystem as a schemaless database?

991

asked Nov 15 '10 23:11

Évariste Galois

2 Answers

Yes a filesystem could be taken as a special case of a NOSQL-like database system. It may have some limitations that should be considered during any design decisions:

pros: - - simple, intuitive.

takes advantage of years of tuning and caching algorithms
easy backup, potentially easy clustering

things to think about:

richness of metadata - what types of data does it store, how does it let you query them, can you have hierarchal or multivalued attributes
speed of querying metadata - not all fs's are particularly well optimized with anything other than size, dates.
inability to join queries (though that's pretty much common to NoSQL)
inefficient storage usage (unless the file system performs block suballocation, you'll typically blow 4-16K per item stored regardless of size)
May not have the kind of caching algorithm you want for it's directory structure
tends to be less tunable, etc.
backup solutions may have trouble depending on how you store things - too deep, too many items per node, etc - which might obviate an obvious advantage of such a structure. locking for a LOCAL filesystem works pretty well of course if you call the right routines, but not necessarily for a network base fileesytem (those problems have been solved in various ways, but it's certainly a design issue)

154

answered Oct 06 '22 00:10

MJB

I got the same idea more than 15 years ago, when hosting costs and hardware limitations where very different from today.

My main motivation was to design a cheap and simple solution able to withstand traffic spikes. Another goal was to improve the security of the applications by removing SQL attack vectors out of the equation.

I end up with a simple document-oriented database, more like a wrapper around FS functions.

What started as a personal project out of curiosity proved to be very rewarding in the long run. I will try to list both pros and cons.

PROS:

Fast
Cheap maintenance. Most applications I build using a file system "database" are still working till today with zero maintenance regarding the database implementation part. This was an unexpected outcome and it is happening due to the fact the file system functions are rarely changing in all the programming languages I used this solution for (PHP, C, C++, Erlang). I can't say the same about applications using mainstream databases. They often require fixing deprecated code and many of my old projects are now dead in the water because either me or the clients decided not to finance the expensive upgrades anymore. Or running old unsupported db versions that pose a high security risk.
Resilient to attacks being completely immune to SQL injections. Many attackers are targeting mainstream products and they are clueless when facing a custom storage facility.
Amazingly good on withstanding traffic spikes compared to many database systems that require sockets connections. It's quite easy to exhaust the maximum connection limitations of a database and many drivers for well known NoSQL databases have a limited connections pool they reuse across multiple threads forcing the industry to design expensive distributed systems.
Unexpected easy to scale. In one case when the application required much more data to be stored that I was initially anticipated I used a distributed file system (Ceph) and I solved the problem without any code modification.
Keeping the files in a RAM FS opens many opportunities to optimize things
Did I say security? All you have to care is usually to make sure any upload process can not write you FS database files nor can play tricks on file names. And of course your usual OS security measures to protect your files.
Easy to backup and maintain using file system tools.

CONS:

Atomic operations are hard to implement due to the lack of supervisor processes that are found in more complex database systems.
Implementing counters is hard and you will have to be quite creative designing a FS based database locking mechanism expecially if you want to remain compatible with distributed FS such as Ceph for which OS level file locks are known to be buggy.
Handling concurrent writes is tricky. I came up with a simple solution resembling Cassandra writes, adding updates as new files and having cron jobs cleaning up the old "versions" of the data.

Disclaimer: Please don't judge me too hard :) I'm a programmer with an old mind set of being more a creator than a user of the out of the box solutions. I lived the times when programmers where doing a lot from scratch to fit their needs including... operating systems. I believe personal experiments (including reinventing the wheel) are good learning opportunities for anybody.

answered Oct 06 '22 00:10

Grigore Madalin

Related questions
                            
                                how to drop all databases except few ones in postgres
                            
                                Where can I get postal codes for all countries?
                            
                                Check if a database table exists using PHP/PDO
                            
                                dbvisualizer: set max rows in a select query
                            
                                Auto Generate Database Diagram for PHPMyadmin DB? [closed]
                            
                                Should I expose a user ID to public?
                            
                                What is the difference between a temporary table vs global temporary table in Oracle?
                            
                                Can a table have two foreign keys?
                            
                                Sqlite Check if Table is Empty [duplicate]
                            
                                Get database path
                            
                                How to insert 1000 rows at a time
                            
                                How do I do greater than/less than using MongoDB?
                            
                                Storing Business Logic in Database
                            
                                SQLite3 UNIQUE constraint failed error
                            
                                Best user role permissions database design practice? [closed]
                            
                                How to get a real time within PostgreSQL transaction?
                            
                                For a beginner, is there much difference between MySQL and PostgreSQL [closed]
                            
                                CREATE DATABASE using file in default path
                            
                                Sqlite and Python -- return a dictionary using fetchone()?
                            
                                Google Cloud SQL increasing size until full disk with no reason

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using a Filesystem (Not a Database!) for Schemaless Data - Best Practices

Tags:

database

filesystems

relational-database

nosql

schemaless

Évariste Galois

People also ask

2 Answers

MJB

Grigore Madalin

Recent Activity

Donate For Us