I have read quite a few articles lately which describe SQL and NoSQL from both sides of the divide, such as http://use-the-index-luke.com/blog/2013-04/whats-left-of-nosql. These articles very often touch the subjects of things like ACID and scalability. However, a number of issues I usually have with SQL seem to be rarely mentioned in these articles and I was wondering why, and whether that had to do with me not entirely understanding SQL. If anyone can enlighten me, at least partly on one or more of the following items, I would highly appreciate it.
My issues with SQL:
I am aware of (but not very familiar with) things like PostgreSQL Hstore, but I don't see entirely how that solves the things mentioned above. Thanks for any insights!
Most SQL databases are vertically scalable, which means that you can increase the load on a single server by increasing components like RAM, SSD, or CPU. In contrast, NoSQL databases are horizontally scalable, which means that they can handle increased traffic simply by adding more servers to the database.
SQL is the programming language used to interface with relational databases. (Relational databases model data as records in rows and tables with logical links between them). NoSQL is a class of DBMs that are non-relational and generally do not use SQL.
Is SQL inherently insecure?
I think you are referring to SQL Injections which is one of the most dangerous security vulnerabilities at all.
However, SQL injection is primarily a problem of education because most textbook and courses don't explain bind parameters at all. Writing literal values into the SQL statement itself is handy for ad-hoc queries when humans use the database directly, but is just the plain wrong way in programs. Programs should always use bind parameters (very few exceptions for performance) effectively protecting the program 100% against SQL injection. The problem is that SQL textbooks don't say that.
Even otherwise, SQL has sound security systems that allow you to limit access to tables, views, and sometimes even selected rows based on some rules ("row level security").
"smallest storage size possible"
Sorry, I don't get that question.
About normalization.
You are right. Normalization solves a few problems (de-duplication and preventing unintentional inconsistencies) but opens some others. Namely:
In principle, SQL should provide tools to compensate for these challenges raised by normalization.
Accessing data from many tables should be done using joins and similar operations of SQL. SQL does more than storing and retrieving data in a 1:1 fashion, it provides tools (joins, subqueries, set operations,...) to transform the normalized data into the form that is most suitable for a specific task. This is done intentionally at runtime because the tasks don't need to be known beforehand. The nature of the data, on the other hand, is considered to be static so that storing it an a normalized fashion is valid. This is a very important key concept of the relational model and SQL: The nature of the data doesn't change so that's the way it should be persistent. How you use that data varies widely and does often change over time — hence this has to be done dynamic. This is of course a very regular task so that it makes sense to have a solid tool to make it easy. We call this tool SQL ;) The DRY rule can be accomplished by using views or CTEs but both may hurt performance because the implementations ain't well optimized for that (something that I openly criticize!).
Keeping data consistent across many tables is mostly done with the help of constraints.
Handling intended "inconsistencies" (histories) was finally covered by SQL:2011: This will allow "AS OF" queries and provide also tools to maintain temporal consistencies (e.g. validity of one row may not overlap with the validity of another row). Arguably, it is pretty poor that it took 40 years or so to come up with a solution for that. And I don't even know when this will be commonly available!
I think that part is pretty much true for every system: "the data is very difficult for humans to make sense of in cases of failures" (my emphasis). However, I think you might mean that it is hard to investigate issues because the required data might be spread across several tables. SQL's answer for that is: VIEWs which are basically just stored queries. However, depending on the database band VIEWs might introduce performance issues. That is, however, a limitation of some database bands, not a limitation of SQL or the relational model.
Keeping History
I've already mentioned that above (SQL:2011).
The following is also true for every system that want's to keep history: "leading to exponentially growing table sizes." Although I'd say its "ever growing" not "exponentially".
The tools to cope with it are triggers or the ORMs. If you want to be sure nobody does a "destructive update" you can just revoke UPDATE rights on that table (and also DELETE to be on the save side).
"prefer data loss to loss of consistency: "
I find that an interesting point of view. However, the SQL answer to this is that you try very hard to not get wrong data into the system in the first place. Mostly by using a proper schema, constraints + ACID. In that way your statement is somehow right: instead of accepting inconsistent data is rejected (which is something different than lost!). So, you must handle the error at the time somebody is entering the rejected data as opposed to some later time when you try to resolve inconsistencies because you accepted the bad data in the first place. So yes, that's the philosophy for the relational model and SQL!
The lack of human readability is obviously depended on your background. However, the correct-ability using SQL is pretty nice, I'd say. Here I'd also like to quote from the original IBM paper about SEQUEL (at that time it was the real name of it):
SEQUEL is intended as a data base sublanguage for both the professional programmer and the more infrequent data base user.
In my observation, it is absolutely true: I've recently had an assignment to teach SQL to support staff so that they can investigate cases directly in the database. They were no programmers, yet understood SQL quite quickly. And I think here kicks your "human" argument in: What they had problems with is navigating a real world relational model consisting of several hundred tables. But that issue was quickly solved by asking development to provide views for some common tasks that involve more than a few tables. Joining those view then was no problem anymore.
For relational thinking you need a different mind set very much you need a different mid set to functional programming. That's not good or bad—yet it might be uncommon for you. Once you use it regularly you'll get used to it.
Object/Relational Impedance Mismatch
I think that topic doesn't need any long discussion: yes it exists, yes there are tools to cope with it in some way or another. I've made my point about excessive use in my article.
Data structure changes
I think that is mostly due to a poor understanding of the relational model in the first place. Compare above: "the nature of the data"
It's also a pretty well discussed argument: schema vs. "schema less". Choose your flavor. "Schema less" quite often just means "doesn't provide schema management tools" nevertheless you have to cope with the fact that we sometimes want to add more properties to an existing entity. RDBMSs provide tools for that: new columns can be nullable or have default values. More drastic changes such as moving one attribute to an extra table (e.g. 1:n) can be done with CREATE AS SELECT. You might even provide a compatibility view that still delivers the data as it ought to be (as though the moved attribute would still be stored in the table). Once you changed your schema, your application can rely on it's constraints (such as the existence of columns or the validity of constraints). That's quite a lot of stuff the database can do for you in an extremely reliable manner. Stuff you don't need to care about in your app anymore.
An argument you didn't mention is that those schema changes often involve downtimes. That's definitively true for the past, and to some extent also today. E.g. MySQL introduce online ALTER TABLE just in 5.6 recently. However, that's quite often an implementation limitation, not a problem inherently tied to the relational model or SQL. Even some more complex changes (like moving an attribute to another table) can be done online when done right and planned carefully (I've done that with one of the expensive database that provide all the tools you need for it). Generally it is about keeping the migration code out of your application and coping in the database with it. After the migration you should neither have migration artifacts in the DB nor in the application code. Of course, there are cases where a downtime is inevitable (I think ;).
"Mixing up of storage logic and application logic"
SQL actually does the exact opposite: SQL abstracts the storage layer completely away.
Nobody forces you to use stored procedures. I personally also think that stored procedures are overused, mostly because stored procedures are stored in the database and can thus be changed (optimized) by database administrators that might not have access to other source code. In other words: I think it is often done out of desperation.
The second argument is, of course, the excessive use of ORMs again and policies that disallow the use of real SQL in the application.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With