I'm just learning about relational databases and was told they don't allow duplicate entries. I did not believe this at first because I wrote SQL queries which returned results with duplicates. It's that the system doesn't store sets that are identical in one relation right? I mean if you had a relation for books and authors and chosed only to select authors then it may appear that there are duplicates if the same author had written different books (e.g. select author from Books
). Also if you use join then you can get duplicate results, right? Any other scenarios?
Basically when people say rmdbs don't have duplicates they just mean the way the information is stored (in one table) right?
Relational databases consist only of relation variables (relational "tables") which by definition always consist of unique tuples ("rows"). A relational database management system uses a query language based on Codd's relational algebra which similarly can't generate duplicates in results because every result in relational algebra is also a relation and therefore consists of unique tuples.
Your confusion arises because you are assuming that SQL is a proper relational language and SQL DBMSs are relational. It isn't and they aren't. Things get tricky when you start to compare SQL with its near-equivalents in the relational world. Your SQL query:
SELECT author FROM Books;
may return duplicate rows, but the apparently similar relational query:
π author (Books)
will not. The relational query is more like the following SQL:
SELECT DISTINCT author FROM Books;
You have to be extremely careful in SQL to avoid getting duplicates where you don't want or expect them. It's a problem that trips up even experienced SQL users.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With