At what point does data normalization become ludicrous?

Question

I often find myself questioning whether I'm taking the right approach in trying to plan for future expansibility when creating databases and relations.

I have the following situation:

I have a Donor table and a Recipient table. Both tables share common information such as first_name, last_name, email_address, date_of_birth, etc. Both seem to, if you'll excuse my object-oriented language, share a common abstract type of Person. It's possible that someone who is at one point a Recipient may later become a Donor by means of giving a donation, so it's important that information isn't duplicated across tables. Should I opt for an inheritance pattern, or should I just foreign key Donors and Recipients to a Person table?
Initially, I was thinking of simply mapping properties like email_address and street address properties directly into the things that need them, but then the possibility may arise that a person would have multiple email addresses or mailing addresses (ie: home, work, etc.). What that means is that we have a model somewhat like this:
```
create table person(id int primary key auto increment, ..., 
    default_email_address);

create table email_address(id int primary key auto increment, 
    email varchar(255), name varchar(255), is_default bool, person_id int);
```
This makes things a bit complicated, as you can imagine. The name field also involves a list of default values as well as allowing custom input. I can't just make it an enum field, because the possibility exists that someone will have a lot of emails to add that could all be different... (this is the point at which I scream out "IS IT EVEN WORTH IT ANYMORE!?!?" and get frustrated with the project)

I guess what this really boils down to is the following: at what point does data normalization become ludicrous? My goal here is to create a really good as-forward-compatible-as-possible data model that I won't kick myself for creating later.

Oded · Accepted Answer

at what point does data normalization become ludicrous?

At the point that it stops modelling the actual requirements.

To take your examples:

With the Donor and Recipient tables, if it is highly likely that any one person will become both, then it does make sense to separate out to a Person entity. If this is rare, it doesn't.
With the email_address and street_address situations, it depends whether you do need to store multiples or not (what is the expectation?). You may want to store separate versions per business unit (say shipping_address vs billing_address).

Sergey Kalinichenko · Answer

I think the problem is not in your implementation, but rather in your analysis of the problem. Donor and Recipient are not first-class actors, they are roles of the actors. If you model them as such, you'd get a somewhat cleaner model:

You'd have a person table with addresses and so on
You'd also have an address table with addresses of the people
You'd also have a person_role table, with the role code (donor, recipient) and other relevant information. You may want to get fancy, and add person_donor and person_recipient, with a foreign key into the person table.

Mike Sherrill 'Cat Recall' · Answer

Short answer: Normalization never becomes ridiculous. Most of what you're doing isn't normalization.

Longer answer

The "worst" (in truth, the "best) most designers can practically do is end up with all tables in 5NF. 5NF isn't ridiculous at all. (Yes, I know about 6NF. I'm ignoring it for didactic reasons.)

questioning whether I'm taking the right approach in trying to plan for future expansibility

That's a good question to ask yourself. It has nothing to do with normalization, though. At the conceptual level, normalization is something you do after you've decided what attributes (columns) and data need to go into your database. Experienced database designers often "think in 3NF", choosing attributes, data, and normalizing all at the same time, more or less.

Should I opt for an inheritance pattern, or should I just foreign key Donors and Recipients to a Person table?

Donors and recipients aren't different types of people. Donors are people who have made a donation. Recipients are people who have received something.

id fullname       don_date    don_amt  recip_date  recip_amt
--
1  Jamie Hubbert  2012-01-13  $20.00
1  Jamie Hubbert  2012-02-13  $17.00
2  Kelly Hawkin   2012-01-13  $50.00
2  Kelly Hawkin   2012-01-13  $20.00
3  Neva Papke                          2012-01-13  $15.00
3  Neva Papke                          2012-02-13  $15.00
2  Kelly Hawkin                        2012-01-13  $10.00
4  Jamie Hubbert  2012-01-13  $10.00   
4  Jamie Hubbert                       2012-02-13  $10.00

During normalization, you'd identify these dependencies. (For simplicity, assumes one donation per person per date.)

person_id -> person_name
person_id -> email
person_id, donation_date -> donation_amount
person_id, recip_date -> recip_amount

Normalize to 5NF, and you'd get these three tables.

Persons
--
1  Jamie Hubbert
2  Kelly Hawkin
3  Neva Papke  
4  Jamie Hubbert

Donations
--
1  2012-01-13  $20.00
1  2012-02-13  $17.00
2  2012-01-13  $50.00
2  2012-01-13  $20.00
4  2012-01-13  $10.00   

Receipts (?)
--
3  2012-01-13  $15.00
3  2012-02-13  $15.00
2  2012-01-13  $10.00
4  2012-02-13  $10.00

Initially, I was thinking of simply mapping properties like email_address and street address properties directly into the things that need them, but then the possibility may arise that a person would have multiple email addresses or mailing addresses (ie: home, work, etc.).

Deciding whether to support multiple email addresses, multiple mailing addresses, and different mailing and delivery addresses is a significant design decision. But it has nothing to do with normalization. Normalization, again, is something you do after you've decided which attributes and data belong in your database. So, if you were collecting representative sample data, you might end up with one of these two sets of email addresses.

Set A
1  Jamie Hubbert  jhubbert@somedomain.com
4  Jamie Hubbert  jamie.hubbert@this.com

Set B
1  Jamie Hubbert  jhubbert@somedomain.com
1  Jamie Hubbert  jamie@my.com
4  Jamie Hubbert  jamie.hubbert@this.com

In set A, person_id->email. In set B, it doesn't. Choosing to support the data in set A or the data in set B is a big decision, and it strongly affects what you end up with after normalizing to 5NF. But deciding which set to support has nothing to do with normalization.

As an aside, choosing to assign id numbers to non-unique email addresses is another big (and questionable) design decision. Like others, this decision has nothing to do with normalization.

(Random names courtesy of The Random Name generator.)

At what point does data normalization become ludicrous?

Tags:

sql

orm

database-normalization

Naftuli Kay

3 Answers

Oded

Sergey Kalinichenko

Mike Sherrill 'Cat Recall'

Recent Activity

Donate For Us

At what point does data normalization become ludicrous?

Tags:

sql

orm

database-normalization

Naftuli Kay

3 Answers

Oded

Sergey Kalinichenko

Mike Sherrill 'Cat Recall'

Related questions

Recent Activity

Donate For Us