Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sql text field vs flat file vs nosql document store

I plan on having a SQL fact table involving a text field which I don't expect to index on (I will only read out the data and very rarely update it). I think this table could get quite large, primarily due to this text field. The rest of the data in my database does make sense to be relational, however I believe I could scale much more easily and cheaply if I instead store pointers to flat files (where each pointer is to a different text file stored in something like S3) instead of using the text field.

An alternative that seems to be gaining popularity is a fully NoSQL document-based solution (e.g. CouchDB, MongoDB, etc.) I am wondering what are the tradeoffs (scalability/reliability/security/performance/ease of implementation/ease of maintenance/cost) between simply using a SQL text field, having a pointer to flat files, or completely rethinking the entire system in the context of a NoSQL document store?

like image 799
user1080972 Avatar asked Jan 18 '23 03:01

user1080972


1 Answers

The best approach is to use a relational db for the normal (non-text) data and save the large (text) data "somewhere else" that can handle large data better than a relational database can.

First, let's discuss why it's a bad idea to save large data in a relational database:'

  • row sizes become much longer, so the I/O required to read in disk pages with target rows balloons
  • backup sizes and, more importantly, backup times enlarge to the point they can cripple DBA tasks and even bring systems offline (then backups are turned off, then the disk fails, oops)
  • you typically don't need to search the text, so there's no need in having it in the database
  • relational databases and libraries/drivers typically aren't good at handling unusually large data, and the way of handling it is often vendor-specific, making any solution non-portable

Your choice of "somewhere else" is broad, but includes:

  • large data storage software like Cassandra, MongoDB, etc
  • NoSQL databases like Lucene
  • File System

Do what's easiest that will work - they are all valid as long as you do your requirements calculations for:

  • peak write performance
  • peak read performance
  • long-term storage volume

Another tip: Don't store anything about the text in the relational database. Instead, name/index the text using the id of the relational database row. That way, if you change your implementation, you don't have to re-jig your data model.

like image 186
Bohemian Avatar answered Jan 29 '23 13:01

Bohemian