Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does SQL standard allow duplicate rows?

One of the core rules for the relational model is the required uniqueness for tuples (rows):

Every individual scalar value in the database must be logically addressable by specifying the name of the containing table, the name of the containing column and the primary key value of the containing row.

In a SQL world, that would mean that there could never exist two rows in a table for which all the column values were equal. If there was no meaningful way to guarantee uniqueness, a surrogate key could be presented to the table.

When the first SQL standard was released, it defined no such restriction and it has been like this ever since. This seems like a root for all kind of evil.

Is there any meaningful reason why it was decided to be that way? In a practical world, where could an absence of such restriction prove to be useful? Does it outweigh the cons?

like image 433
Jaanus Varus Avatar asked Jun 10 '15 21:06

Jaanus Varus


1 Answers

The short answer is that SQL is not relational and SQL DBMSs are not relational DBMSs.

Duplicate rows are a fundamental part of the SQL model of data because the SQL language doesn't really try to implement the relational algebra. SQL uses a bag (multiset)-based algebra instead. The results of queries and other operations in relational algebra are relations that always have distinct tuples, but SQL DBMSs don't have the luxury of dealing only with relations. Given this fundamental "feature" of the SQL language, SQL database engines need to have mechanisms for processing and storing duplicate rows.

Why was SQL designed that way? One reason seems to be that the relational model was just too big a leap of faith to make at that time. The relational model was an idea well ahead of its time. SQL on the other hand, was and remains very much rooted in the systems of three decades ago.

like image 70
nvogel Avatar answered Sep 17 '22 16:09

nvogel