Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL vs NoSQL: what about other issues than ACID and scalibility?

I have read quite a few articles lately which describe SQL and NoSQL from both sides of the divide, such as http://use-the-index-luke.com/blog/2013-04/whats-left-of-nosql. These articles very often touch the subjects of things like ACID and scalability. However, a number of issues I usually have with SQL seem to be rarely mentioned in these articles and I was wondering why, and whether that had to do with me not entirely understanding SQL. If anyone can enlighten me, at least partly on one or more of the following items, I would highly appreciate it.

My issues with SQL:

  1. SQL is inherently insecure: SQL is a language which was made for insertion and doesn't have any methods to prevent insertion of code instead of data. The only way to prevent insertion is to completely isolate SQL from the application using it. Why hasn't this been solved yet in SQL itself?
  2. SQL seems to have been made for the smallest storage size possible for the data contained in it. While that still makes a lot of sense for huge amounts of data, it doesn't really for smaller databases anymore, or does it?
  3. SQL forces everything to fit a two dimensional relational model, with specific relation tables to do the other dimensions. To me this poses as two problems:
    • the consistency of data is completely relying on relation tables
    • the data is very difficult for humans to make sense of in cases of failures
  4. SQL doesn't maintain a history as it does destructive updates by default: there are of course all kinds of ways to create a history, but that requires custom written stuff with extra tables and the use of time stamps, or writing a new record for every change, leading to exponentially growing table sizes.
  5. SQL seems to prefer data loss to loss of consistency: if an error occurs or loss of consistency, the only way of restoring the situation to a consistent state is to use a backup, which means that the latest changes will be destroyed. This is partly because of the lack of history (see 4), but also, because of the lack of human readability, there is no real way to have a human try to correct errors.
  6. Especially in a web-environment, the use of SQL usually means to have models created often more than once. In a normal (simple) PHP web application twice: once in PHP, once in SQL. In a full stack web application three times: once in the client app, once in the middleware, and once in the SQL database (if no ORM is used). Because of the different programming languages, and the type differences between them, it means that there is a lot of possible conflicts between those models. I am aware that ORMs like ActiveRecord and Django solve at least part of these problems, but the amount of extra work you still need to do because an SQL table contains a VARCHAR(25) and none of the languages used (JavaScript, Ruby, PHP, Perl, Python etc) know that kind of a construct, is huge.
  7. Data structure changes seem to be regarded as a consistency issue: if something changes in the data structure, table changes are applied to every existing record in that table, even if a record didn't have that field originally and whether or not it makes sense for that record to have that field. A set of those changes lead to automated migrations, which add another layer of possible problems, especially with regards to consistency.
  8. Mixing up of storage logic and application logic: SQL seems to be eager to gobble up parts of the application logic as stored procedures (CouchDB also does this through views). While I do understand that for some types of operations you need server side and very strictly controlled procedures, I don't understand why they are stored in the database and as such part of the storage engine, instead of being a part of the application (middleware).

I am aware of (but not very familiar with) things like PostgreSQL Hstore, but I don't see entirely how that solves the things mentioned above. Thanks for any insights!

like image 405
mauritslamers Avatar asked Apr 26 '14 08:04

mauritslamers


People also ask

Is NoSQL or SQL more scalable?

Most SQL databases are vertically scalable, which means that you can increase the load on a single server by increasing components like RAM, SSD, or CPU. In contrast, NoSQL databases are horizontally scalable, which means that they can handle increased traffic simply by adding more servers to the database.

What is the main difference between SQL and NoSQL?

SQL is the programming language used to interface with relational databases. (Relational databases model data as records in rows and tables with logical links between them). NoSQL is a class of DBMs that are non-relational and generally do not use SQL.


2 Answers

  1. Is SQL inherently insecure?

    I think you are referring to SQL Injections which is one of the most dangerous security vulnerabilities at all.

    However, SQL injection is primarily a problem of education because most textbook and courses don't explain bind parameters at all. Writing literal values into the SQL statement itself is handy for ad-hoc queries when humans use the database directly, but is just the plain wrong way in programs. Programs should always use bind parameters (very few exceptions for performance) effectively protecting the program 100% against SQL injection. The problem is that SQL textbooks don't say that.

    Even otherwise, SQL has sound security systems that allow you to limit access to tables, views, and sometimes even selected rows based on some rules ("row level security").

  2. "smallest storage size possible"

    Sorry, I don't get that question.

  3. About normalization.

    You are right. Normalization solves a few problems (de-duplication and preventing unintentional inconsistencies) but opens some others. Namely:

    • How to easily access data from many tables.
    • How to maintain consistency across many tables.
    • How to cope with intentional "inconsistencies" aka. histories (master data changes)

    In principle, SQL should provide tools to compensate for these challenges raised by normalization.

    Accessing data from many tables should be done using joins and similar operations of SQL. SQL does more than storing and retrieving data in a 1:1 fashion, it provides tools (joins, subqueries, set operations,...) to transform the normalized data into the form that is most suitable for a specific task. This is done intentionally at runtime because the tasks don't need to be known beforehand. The nature of the data, on the other hand, is considered to be static so that storing it an a normalized fashion is valid. This is a very important key concept of the relational model and SQL: The nature of the data doesn't change so that's the way it should be persistent. How you use that data varies widely and does often change over time — hence this has to be done dynamic. This is of course a very regular task so that it makes sense to have a solid tool to make it easy. We call this tool SQL ;) The DRY rule can be accomplished by using views or CTEs but both may hurt performance because the implementations ain't well optimized for that (something that I openly criticize!).

    Keeping data consistent across many tables is mostly done with the help of constraints.

    Handling intended "inconsistencies" (histories) was finally covered by SQL:2011: This will allow "AS OF" queries and provide also tools to maintain temporal consistencies (e.g. validity of one row may not overlap with the validity of another row). Arguably, it is pretty poor that it took 40 years or so to come up with a solution for that. And I don't even know when this will be commonly available!

    I think that part is pretty much true for every system: "the data is very difficult for humans to make sense of in cases of failures" (my emphasis). However, I think you might mean that it is hard to investigate issues because the required data might be spread across several tables. SQL's answer for that is: VIEWs which are basically just stored queries. However, depending on the database band VIEWs might introduce performance issues. That is, however, a limitation of some database bands, not a limitation of SQL or the relational model.

  4. Keeping History

    I've already mentioned that above (SQL:2011).

    The following is also true for every system that want's to keep history: "leading to exponentially growing table sizes." Although I'd say its "ever growing" not "exponentially".

    The tools to cope with it are triggers or the ORMs. If you want to be sure nobody does a "destructive update" you can just revoke UPDATE rights on that table (and also DELETE to be on the save side).

  5. "prefer data loss to loss of consistency: "

    I find that an interesting point of view. However, the SQL answer to this is that you try very hard to not get wrong data into the system in the first place. Mostly by using a proper schema, constraints + ACID. In that way your statement is somehow right: instead of accepting inconsistent data is rejected (which is something different than lost!). So, you must handle the error at the time somebody is entering the rejected data as opposed to some later time when you try to resolve inconsistencies because you accepted the bad data in the first place. So yes, that's the philosophy for the relational model and SQL!

    The lack of human readability is obviously depended on your background. However, the correct-ability using SQL is pretty nice, I'd say. Here I'd also like to quote from the original IBM paper about SEQUEL (at that time it was the real name of it):

    SEQUEL is intended as a data base sublanguage for both the professional programmer and the more infrequent data base user.

    In my observation, it is absolutely true: I've recently had an assignment to teach SQL to support staff so that they can investigate cases directly in the database. They were no programmers, yet understood SQL quite quickly. And I think here kicks your "human" argument in: What they had problems with is navigating a real world relational model consisting of several hundred tables. But that issue was quickly solved by asking development to provide views for some common tasks that involve more than a few tables. Joining those view then was no problem anymore.

    For relational thinking you need a different mind set very much you need a different mid set to functional programming. That's not good or bad—yet it might be uncommon for you. Once you use it regularly you'll get used to it.

  6. Object/Relational Impedance Mismatch

    I think that topic doesn't need any long discussion: yes it exists, yes there are tools to cope with it in some way or another. I've made my point about excessive use in my article.

  7. Data structure changes

    I think that is mostly due to a poor understanding of the relational model in the first place. Compare above: "the nature of the data"

    It's also a pretty well discussed argument: schema vs. "schema less". Choose your flavor. "Schema less" quite often just means "doesn't provide schema management tools" nevertheless you have to cope with the fact that we sometimes want to add more properties to an existing entity. RDBMSs provide tools for that: new columns can be nullable or have default values. More drastic changes such as moving one attribute to an extra table (e.g. 1:n) can be done with CREATE AS SELECT. You might even provide a compatibility view that still delivers the data as it ought to be (as though the moved attribute would still be stored in the table). Once you changed your schema, your application can rely on it's constraints (such as the existence of columns or the validity of constraints). That's quite a lot of stuff the database can do for you in an extremely reliable manner. Stuff you don't need to care about in your app anymore.

    An argument you didn't mention is that those schema changes often involve downtimes. That's definitively true for the past, and to some extent also today. E.g. MySQL introduce online ALTER TABLE just in 5.6 recently. However, that's quite often an implementation limitation, not a problem inherently tied to the relational model or SQL. Even some more complex changes (like moving an attribute to another table) can be done online when done right and planned carefully (I've done that with one of the expensive database that provide all the tools you need for it). Generally it is about keeping the migration code out of your application and coping in the database with it. After the migration you should neither have migration artifacts in the DB nor in the application code. Of course, there are cases where a downtime is inevitable (I think ;).

  8. "Mixing up of storage logic and application logic"

    SQL actually does the exact opposite: SQL abstracts the storage layer completely away.

    Nobody forces you to use stored procedures. I personally also think that stored procedures are overused, mostly because stored procedures are stored in the database and can thus be changed (optimized) by database administrators that might not have access to other source code. In other words: I think it is often done out of desperation.

    The second argument is, of course, the excessive use of ORMs again and policies that disallow the use of real SQL in the application.

like image 200
Markus Winand Avatar answered Nov 04 '22 03:11

Markus Winand


  1. SQL is inherently insecure. No it isn't. Use parameterized statements.
  2. SQL seems to have been made for the smallest storage size possible. That is only a secondary consequence of reducing duplication via normalization. The benefits of normalization are smaller updates and simplified constraint checking.
  3. SQL forces everything to fit a two dimensional relational model. This somewhat true in for DML (querying) but not for DDL (data definition), which is by far more important. Modern SQL breaks the 2D composition anyways via technologies like postgres json.
  4. SQL doesn't maintain a history: this is completely bogus. It's trivial to maintain history via triggers, but this should absolutely not be the default. Data time travel is incredibly space intensive, but, if you must do it, will be easier and smaller in SQL vs noSQL due to normalization and being able to run RI through time if you need formal historical constraint checking.
  5. SQL seems to prefer data loss to loss of consistency This doens't follow. There are many ways to restore consistency outside of restoring from backups.
  6. In a full stack web application three times. This is true. But you are looking to cut fat in the wrong place. SQL is a very formal definition of the data. Consider dropping the ORM and cutting the client side down.
  7. Data structure changes seem to be regarded as a consistency issue: This is just flat out false. you can leave a new record default null which in most sql implementations will not cause the record to be rewritten (ALTER TABLE foo ADD f TEXT; is O(1)). Also note postgres structure changes are transactional and can be rolled back.
  8. Mixing up of storage logic and application logic: That's a pretty silly argument. It's analogous to saying C binaries should not be compiled with data structures. There is no such thing as 'storage logic, business logic, or application logic'. There's just logic. Coupling logic tighter to the data structures greatly improves performance and makes for easier management of functional dependencies. I will never understand why programmers thing it's 'better' to have the database serialize a record through a protocol, move it into an application on another process (or even another server) and write it back to the database via deserializing instead of letting the database do it.
like image 43
user32365 Avatar answered Nov 04 '22 03:11

user32365