Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

File Storage for Web Applications: Filesystem vs DB vs NoSQL engines

I have a web application that stores a lot of user generated files. Currently these are all stored on the server filesystem, which has several downsides for me.

  • When we move "folders" (as defined by our application) we also have to move the files on disk (although this is more due to strange design decisions on the part of the original developers than a requirement of storing things on the filesystem).
  • It's hard to write tests for file system actions; I have a mock filesystem class that logs actions like move, delete etc, without performing them, which more or less does the job, but I don't have 100% confidence in the tests.
  • I will be adding some other jobs which need to access the files from other service to perform additional tasks (e.g. indexing in Solr, generating thumbnails, movie format conversion), so I need to get at the files remotely. Doing this over network shares seems dodgy...
  • Dealing with permissions on the filesystem as sometimes given us problems in the past, although now that we've moved to a pure Linux environment this should be less of an issue.

So, my main questions are

  • What are the downsides of storing files as BLOBs in MySQL?
  • Do the same problems exist with NoSQL systems like Cassandra?
  • Does anyone have any other suggestions that might be appropriate, e.g. MogileFS, etc?
like image 605
El Yobo Avatar asked May 23 '10 01:05

El Yobo


People also ask

Is it better to store files in database or filesystem?

Database provides a proper data recovery process while file system did not. In terms of security the database is more secure then the file system (usually).

Can NoSQL store files?

A NoSQL document store does as the name suggests and stores documents in one or other format. Examples include Apache CouchDB and Couchbase Server, which store JSON documents, and MongoDB, which stores JSON-like documents.

Why is it better to store data in a database rather than in the file system?

Data sharing: The file system does not allow sharing of data or sharing is too complex. Whereas in DBMS, data can be shared easily due to a centralized system. Data concurrency: Concurrent access to data means more than one user is accessing the same data at the same time.

What is the difference between database and file system storage?

The main differences between the Database and File System storage is: The database is a software application used to insert, update and delete data while the file system is a software used to add, update and delete files.

When is a file system more efficient than a database?

When handling small data sets with arbitrary, probably unrelated data, file is more efficient than database. For simple operations, read, write, file operations are faster and simple. You can find n number of difference over internet. The only difference between database and file system is that the way and structure in which the data is stored.

How does a database program represent its data on the filesystem?

That said, a database can represent its data on the filesystem however it likes in however many files it sees fit to use. The syscalls are generally on the level of open/read from/write to/close/memory-map/etc. a given file; the database program can do whatever it likes with that set of operations.

Does storing large files in dB slow down the performance?

To justify this, if you store large files in DB, then it may slow down the performance because a simple query to retrieve the list of files or filename will also load the file data if you used Select * in your query. In a files ystem, accessing a file is quite simple and light weight.


3 Answers

Not a direct answer but some pointers to very interesting and somehow similar questions (yeah, they are about blobs and images but this is IMO comparable).

What are the downsides of storing files as BLOBs in MySQL?

  • Storing Images in DB - Yea or Nay?
  • Images in database vs file system
  • https://stackoverflow.com/search?q=images+database+filesystem

Do the same problems exist with NoSQL systems like Cassandra?

  • NoSQL for filesystem storage organization and replication?
  • Storing images in NoSQL stores

PS: I don't want to be the killjoy but I don't think that any NoSQL solution is going to solve your problem (NoSQL is just irrelevant for most businesses).

like image 200
Pascal Thivent Avatar answered Sep 22 '22 03:09

Pascal Thivent


maybe a hybrid solution.

Use a database to store metadata about each file - and use the file system to actually store the file.

any restructuring of 'folders' could be modelled in the DB and dereferenced from the actual OS location.

like image 26
Randy Avatar answered Sep 21 '22 03:09

Randy


You can store files up to 2GB easily in Cassandra by splitting them into 1MB columns or so. This is pretty common.

You could store it as one big column too, but then you'd have to read the whole thing into memory when accessing it.

like image 41
jbellis Avatar answered Sep 22 '22 03:09

jbellis