Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Good reasons NOT to use a relational database?

People also ask

Why is non-relational database better?

Non-relational databases often perform faster because a query doesn't have to view several tables in order to deliver an answer, as relational datasets often do. Non-relational databases are therefore ideal for storing data that may be changed frequently or for applications that handle many different kinds of data.


Plain text files in a filesystem

  • Very simple to create and edit
  • Easy for users to manipulate with simple tools (i.e. text editors, grep etc)
  • Efficient storage of binary documents

XML or JSON files on disk

  • As above, but with a bit more ability to validate the structure.

Spreadsheet / CSV file

  • Very easy model for business users to understand

Subversion (or similar disk based version control system)

  • Very good support for versioning of data

Berkeley DB (Basically, a disk based hashtable)

  • Very simple conceptually (just un-typed key/value)
  • Quite fast
  • No administration overhead
  • Supports transactions I believe

Amazon's Simple DB

  • Much like Berkeley DB I believe, but hosted

Google's App Engine Datastore

  • Hosted and highly scalable
  • Per document key-value storage (i.e. flexible data model)

CouchDB

  • Document focus
  • Simple storage of semi-structured / document based data

Native language collections (stored in memory or serialised on disk)

  • Very tight language integration

Custom (hand-written) storage engine

  • Potentially very high performance in required uses cases

I can't claim to know anything much about them, but you might also like to look into object database systems.


Matt Sheppard's answer is great (mod up), but I would take account these factors when thinking about a spindle:

  1. Structure : does it obviously break into pieces, or are you making tradeoffs?
  2. Usage : how will the data be analyzed/retrieved/grokked?
  3. Lifetime : how long is the data useful?
  4. Size : how much data is there?

One particular advantage of CSV files over RDBMSes is that they can be easy to condense and move around to practically any other machine. We do large data transfers, and everything's simple enough we just use one big CSV file, and easy to script using tools like rsync. To reduce repetition on big CSV files, you could use something like YAML. I'm not sure I'd store anything like JSON or XML, unless you had significant relationship requirements.

As far as not-mentioned alternatives, don't discount Hadoop, which is an open source implementation of MapReduce. This should work well if you have a TON of loosely structured data that needs to be analyzed, and you want to be in a scenario where you can just add 10 more machines to handle data processing.

For example, I started trying to analyze performance that was essentially all timing numbers of different functions logged across around 20 machines. After trying to stick everything in a RDBMS, I realized that I really don't need to query the data again once I've aggregated it. And, it's only useful in it's aggregated format to me. So, I keep the log files around, compressed, and then leave the aggregated data in a DB.

Note I'm more used to thinking with "big" sizes.


The filesystem's prety handy for storing binary data, which never works amazingly well in relational databases.


Try Prevayler: http://www.prevayler.org/wiki/ Prevayler is alternative to RDBMS. In the site have more info.


If you don't need ACID, you probably don't need the overhead of an RDBMS. So, determine whether you need that first. Most of the non-RDBMS answers provided here do not provide ACID.


Custom (hand-written) storage engine / Potentially very high performance in required uses cases

http://www.hdfgroup.org/

If you have enormous data sets, instead of rolling your own, you might use HDF, the Hierarchical Data Format.

http://en.wikipedia.org/wiki/Hierarchical_Data_Format:

HDF supports several different data models, including multidimensional arrays, raster images, and tables.

It's also hierarchical like a file system, but the data is stored in one magic binary file.

HDF5 is a suite that makes possible the management of extremely large and complex data collections.

Think petabytes of NASA/JPL remote sensing data.


G'day,

One case that I can think of is when the data you are modelling cannot be easily represented in a relational database.

Once such example is the database used by mobile phone operators to monitor and control base stations for mobile telephone networks.

I almost all of these cases, an OO DB is used, either a commercial product or a self-rolled system that allows heirarchies of objects.

I've worked on a 3G monitoring application for a large company who will remain nameless, but whose logo is a red wine stain (-: , and they used such an OO DB to keep track of all the various attributes for individual cells within the network.

Interrogation of such DBs is done using proprietary techniques that are, usually, completely free from SQL.

HTH.

cheers,

Rob