Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When can an SQL result contain duplicates?

I'm just learning about relational databases and was told they don't allow duplicate entries. I did not believe this at first because I wrote SQL queries which returned results with duplicates. It's that the system doesn't store sets that are identical in one relation right? I mean if you had a relation for books and authors and chosed only to select authors then it may appear that there are duplicates if the same author had written different books (e.g. select author from Books). Also if you use join then you can get duplicate results, right? Any other scenarios?

Basically when people say rmdbs don't have duplicates they just mean the way the information is stored (in one table) right?

like image 824
Celeritas Avatar asked Dec 07 '22 11:12

Celeritas


1 Answers

Relational databases consist only of relation variables (relational "tables") which by definition always consist of unique tuples ("rows"). A relational database management system uses a query language based on Codd's relational algebra which similarly can't generate duplicates in results because every result in relational algebra is also a relation and therefore consists of unique tuples.

Your confusion arises because you are assuming that SQL is a proper relational language and SQL DBMSs are relational. It isn't and they aren't. Things get tricky when you start to compare SQL with its near-equivalents in the relational world. Your SQL query:

SELECT author FROM Books;

may return duplicate rows, but the apparently similar relational query:

π author (Books)

will not. The relational query is more like the following SQL:

SELECT DISTINCT author FROM Books;

You have to be extremely careful in SQL to avoid getting duplicates where you don't want or expect them. It's a problem that trips up even experienced SQL users.

like image 111
nvogel Avatar answered Jan 05 '23 12:01

nvogel