This question requires some hypothetical background. Let's consider an <code>employee</code> table that has columns <code>name</code>, <code>date_of_birth</code>, <code>title</code>, <code>salary</code>, using MySQL as the RDBMS. Since if any given person has the same name and birth date as another person, they are, by definition, the same person (barring amazing coincidences where we have two people named Abraham Lincoln born on February 12, 1809), we'll put a unique key on <code>name</code> and <code>date_of_birth</code> that means "don't store the same person twice." Now consider this data: <pre class="prettyprint"><code>id name date_of_birth title salary 1 John Smith 1960-10-02 President 500,000 2 Jane Doe 1982-05-05 Accountant 80,000 3 Jim Johnson NULL Office Manager 40,000 4 Tim Smith 1899-04-11 Janitor 95,000 </code></pre> If I now try to run the following statement, it should and will fail: <pre class="prettyprint"><code>INSERT INTO employee (name, date_of_birth, title, salary) VALUES ('Tim Smith', '1899-04-11', 'Janitor', '95,000') </code></pre> If I try this one, it will succeed: <pre class="prettyprint"><code>INSERT INTO employee (name, title, salary) VALUES ('Jim Johnson', 'Office Manager', '40,000') </code></pre> And now my data will look like this: <pre class="prettyprint"><code>id name date_of_birth title salary 1 John Smith 1960-10-02 President 500,000 2 Jane Doe 1982-05-05 Accountant 80,000 3 Jim Johnson NULL Office Manager 40,000 4 Tim Smith 1899-04-11 Janitor 95,000 5 Jim Johnson NULL Office Manager 40,000 </code></pre> This is not what I want but I can't say I entirely disagree with what happened. If we talk in terms of mathematical sets, <pre class="prettyprint"><code>{'Tim Smith', '1899-04-11'} = {'Tim Smith', '1899-04-11'} <-- TRUE {'Tim Smith', '1899-04-11'} = {'Jane Doe', '1982-05-05'} <-- FALSE {'Tim Smith', '1899-04-11'} = {'Jim Johnson', NULL} <-- UNKNOWN {'Jim Johnson', NULL} = {'Jim Johnson', NULL} <-- UNKNOWN </code></pre> My guess is that MySQL says, "Since I don't know that Jim Johnson with a <code>NULL</code> birth date isn't already in this table, I'll add him." My question is: How can I prevent duplicates even though <code>date_of_birth</code> is not always known? The best I've come up with so far is to move <code>date_of_birth</code> to a different table. The problem with that, however, is that I might end up with, say, two cashiers with the same name, title and salary, different birth dates and no way to store them both without having duplicates.

A fundamental property of a unique key is that it must be unique. Making part of that key Nullable destroys this property. There are two possible solutions to your problem: <ul> <li>One way, the wrong way, would be to use some magic date to represent unknown. This just gets you past the DBMS "problem" but does not solve the problem in a logical sense. Expect problems with two "John Smith" entries having unknown dates of birth. Are these guys one and the same or are they unique individuals? If you know they are different then you are back to the same old problem - your Unique Key just isn't unique. Don't even think about assigning a whole range of magic dates to represent "unknown" - this is truly the road to hell.</li> <li>A better way is to create an EmployeeId attribute as a surrogate key. This is just an arbitrary identifier that you assign to individuals that you know are unique. This identifier is often just an integer value. Then create an Employee table to relate the EmployeeId (unique, non-nullable key) to what you believe are the dependant attributers, in this case Name and Date of Birth (any of which may be nullable). Use the EmployeeId surrogate key everywhere that you previously used the Name/Date-of-Birth. This adds a new table to your system but solves the problem of unknown values in a robust manner.</li> </ul>

Unique key with NULLs

Tags:

database

null

mysql

relational-model

This question requires some hypothetical background. Let's consider an employee table that has columns name, date_of_birth, title, salary, using MySQL as the RDBMS. Since if any given person has the same name and birth date as another person, they are, by definition, the same person (barring amazing coincidences where we have two people named Abraham Lincoln born on February 12, 1809), we'll put a unique key on name and date_of_birth that means "don't store the same person twice." Now consider this data:

Click to copy

id name        date_of_birth title          salary  1 John Smith  1960-10-02    President      500,000  2 Jane Doe    1982-05-05    Accountant      80,000  3 Jim Johnson NULL          Office Manager  40,000  4 Tim Smith   1899-04-11    Janitor         95,000

If I now try to run the following statement, it should and will fail:

Click to copy

INSERT INTO employee (name, date_of_birth, title, salary) VALUES ('Tim Smith', '1899-04-11', 'Janitor', '95,000')

If I try this one, it will succeed:

Click to copy

INSERT INTO employee (name, title, salary) VALUES ('Jim Johnson', 'Office Manager', '40,000')

And now my data will look like this:

Click to copy

id name        date_of_birth title          salary  1 John Smith  1960-10-02    President      500,000  2 Jane Doe    1982-05-05    Accountant      80,000  3 Jim Johnson NULL          Office Manager  40,000  4 Tim Smith   1899-04-11    Janitor         95,000  5 Jim Johnson NULL          Office Manager  40,000

This is not what I want but I can't say I entirely disagree with what happened. If we talk in terms of mathematical sets,

Click to copy

{'Tim Smith', '1899-04-11'} = {'Tim Smith', '1899-04-11'} <-- TRUE {'Tim Smith', '1899-04-11'} = {'Jane Doe', '1982-05-05'} <-- FALSE {'Tim Smith', '1899-04-11'} = {'Jim Johnson', NULL} <-- UNKNOWN {'Jim Johnson', NULL} = {'Jim Johnson', NULL} <-- UNKNOWN

My guess is that MySQL says, "Since I don't know that Jim Johnson with a NULL birth date isn't already in this table, I'll add him."

My question is: How can I prevent duplicates even though date_of_birth is not always known? The best I've come up with so far is to move date_of_birth to a different table. The problem with that, however, is that I might end up with, say, two cashiers with the same name, title and salary, different birth dates and no way to store them both without having duplicates.

688

asked Nov 02 '10 20:11

Jason Swett

2 Answers

A fundamental property of a unique key is that it must be unique. Making part of that key Nullable destroys this property.

There are two possible solutions to your problem:

One way, the wrong way, would be to use some magic date to represent unknown. This just gets you past the DBMS "problem" but does not solve the problem in a logical sense. Expect problems with two "John Smith" entries having unknown dates of birth. Are these guys one and the same or are they unique individuals? If you know they are different then you are back to the same old problem - your Unique Key just isn't unique. Don't even think about assigning a whole range of magic dates to represent "unknown" - this is truly the road to hell.
A better way is to create an EmployeeId attribute as a surrogate key. This is just an arbitrary identifier that you assign to individuals that you know are unique. This identifier is often just an integer value. Then create an Employee table to relate the EmployeeId (unique, non-nullable key) to what you believe are the dependant attributers, in this case Name and Date of Birth (any of which may be nullable). Use the EmployeeId surrogate key everywhere that you previously used the Name/Date-of-Birth. This adds a new table to your system but solves the problem of unknown values in a robust manner.

111

answered Sep 21 '22 12:09

NealB

I think MySQL does it right here. Some other databases (for example Microsoft SQL Server) treat NULL as a value that can only be inserted once into a UNIQUE column, but personally I find this to be strange and unexpected behaviour.

However since this is what you want, you can use some "magic" value instead of NULL, such as a date a long time in the past

answered Sep 23 '22 12:09

Mark Byers

Related questions
                            
                                Writing to MySQL database with pandas using SQLAlchemy, to_sql
                            
                                MySQL datatype INT(11) whereas UNSIGNED INT(10)?
                            
                                Order by COUNT per value
                            
                                Git Bash mysql blank
                            
                                Hibernate JPA, MySQL and TinyInt(1) for Boolean instead of bit or char
                            
                                How can I convert a string to a float in mysql?
                            
                                use mysql SUM() in a WHERE clause
                            
                                Copy mysql database from remote server to local computer
                            
                                How to take complete backup of mysql database using mysqldump command line utility
                            
                                MySQL order by "best match"
                            
                                Fatal error: Please read "Security" section of the manual to find out how to run mysqld as root
                            
                                How can I add an INDEX with Doctrine 2 to a column without making it a primary key?
                            
                                Delete all rows with timestamp older than x days
                            
                                Parse date in MySQL
                            
                                GROUP_CONCAT with limit
                            
                                How to retrieve SQL result column value using column name in Python?
                            
                                Insert and set value with max()+1 problems
                            
                                Get records of current month [duplicate]
                            
                                MySQL IFNULL ELSE
                            
                                Multiple Table Select vs. JOIN (performance)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Unique key with NULLs

Tags:

database

null

mysql

relational-model

Jason Swett

People also ask

2 Answers

NealB

Mark Byers

Recent Activity

Donate For Us