Diamond schema: how (de)normalized is that?

Tags:

Let's suppose we have the following entities:

Production Studio
Journalist
Camera Operator
News Footage

In this simple world, production studio has many journalists and many camera operators. Each journalist belongs to exactly one studio. Same thing with operators. A news footage is produced by one journalist and one operator, where both come from the same studio.

Here's my naive approach to put this model into relational database:

CREATE TABLE production_studios(
  id                   SERIAL PRIMARY KEY,
  title                TEXT NOT NULL
);

CREATE TABLE journalists(
  id                   SERIAL PRIMARY KEY,
  name                 TEXT NOT NULL,
  prodution_studio_id  INTEGER NOT NULL REFERENCES production_studios
);

CREATE TABLE camera_operators(
  id                   SERIAL PRIMARY KEY,
  name                 TEXT NOT NULL,
  production_studio_id INTEGER NOT NULL REFERENCES production_studios
);

CREATE TABLE news_footages(
  id                   SERIAL PRIMARY KEY,
  description          TEXT NOT NULL,
  journalist_id        INTEGER NOT NULL REFERENCES journalists,
  camera_operator_id   INTEGER NOT NULL REFERENCES camera_operators
);

This schema forms nicely shaped diamond ERD and a few questions.

The problem is that news footage can link together a journalist with a camera operator which come from different production studios. I understand that this can be cured by writing corresponding constraints, but for the sake of experiment let's pretend that we're doing exercise in Normal Form database design.

The first question is about terminology: is it correct to state that this schema is denormalized? If yes, which normal form does it break? Or is there any better name for this anomaly, like inter-record redundancy, multipath relationships, etc?
How this schema can be changed to make described anomaly impossible?

And of course I'd very much appreciate references to papers addressing this specific issue.

867

asked Feb 23 '12 18:02

Serge Balyuk

1 Answers

The naive way would be to make your journalists and camera_operators dependent entities, dependent upon the studio for which they work. That means the production studio foreign key becomes part of their primary key. Your news_footage table then has a primary key consisting of 4 components:

production_studio_id
journalist_id
camera_operator_id
footage_id

and two foreign keys:

journalist_id,production_studio_id, pointing to the journalist table, and
camera_operator,production_studio_id, pointing to the camera operator table

Easy.

Or Not. Now you have defined in your E-R model the notion that the very existence of a camera operator or a journalist is dependent upon the studio for which they work. This does not reflect the real work very well: in this model, people can't change their employer.

Let's not do that.

In your original model, you confusing a person with a _role they play (journalist or camera operator), and you're missing a somewhat transient entity that is actually responsible for the production of your news footage: the [studio-specific] production team.

My E-R model would look something like this:

create table studio
(
  id int not null primary key ,
  title varchar(200) not null ,
)

create table person
(
  id int not null primary key ,
  title varchar(200) not null ,
)

create table team
(
  studio_id          int not null ,
  journalist_id      int not null ,
  camera_operator_id int not null ,

  primary key ( studio_id , journalist_id , camera_operator ) ,

  foreign key ( studio_id          ) references studio ( id ) ,
  foreign key ( journalist_id      ) references person ( id ) ,
  foreign key ( camera_operator_id ) references person ( id ) ,

)

create table footage
(
  studio_id          int not null ,
  journalist_id      int not null ,
  camera_operator_id int not null ,
  id                 int not null ,
  description        varchar(200) not null ,

  primary key ( studio_id , journalist_id , camera_operator_id , id ) ,

  foreign key     ( studio_id , journalist_id , camera_operator_id )
  references team ( studio_id , journalist_id , camera_operator_id ) ,

)

Now you have a world in which people can work in different roles: the same person might be a camera operator in some contexts and a journalist in others. People can change employers. Studio-specific teams are composed, consisting of a journalist and a camera operator. In some contexts, the same person might play both roles on a team. And, finally, a piece of news footage is produced by one and only one studio-specific team.

This reflects the real world much better, and it is much more flexible.

Edited to add sample query:

To find the journalists working for a particular studio:

select p.*
from studio s
join team   t on t.studio_id = s.id
join person p on p.id        = t.journalist_id
where s.title = 'my desired studio name'

This would give you the set of people who are (or have) been associated with a studio in the role of journalist. One should note though, that in the real world, people work for employers for a period of time: to model it properly you need a start/end date and you need to qualify the query with a relative notion of now.

answered Oct 13 '22 20:10

Nicholas Carey

Related questions
                            
                                Entity Framework 6.1 Code First Cascading Delete with TPH for one-to-one relationship on a derived type
                            
                                SQL add columns of each record together
                            
                                Why query optimizer selects completely different query plans?
                            
                                Can someone explain why these two linq queries return different results?
                            
                                NHibernate inconsistent sql column alias
                            
                                Slow performance when using OFFSET/FETCH with Fulltext in SQL Server 2012
                            
                                Are there performance/other downsides in creating a new RJDBC connections to MS SQL database for each request?
                            
                                Exclamation mark in SQL (Oracle)
                            
                                Enable the query cache in postgreSQL to improve performance
                            
                                Transient errors during SQL Server failovers
                            
                                Proper insertion of table name
                            
                                Query optimization when using a JSON field
                            
                                Scala doobie fragment with generic type parameter
                            
                                Merge Replication error: You do not have permission to run 'SP_TRACE_GENERATEEVENT'
                            
                                How to handle duplication between java enum and database table?
                            
                                How can I limit number of results by a specific column in postgreSQL?
                            
                                Lexing partial SQL in C#
                            
                                SQL LIMIT returns no results where no LIMIT returns results
                            
                                Complex time-series statistical aggregation involving polymorphic associations
                            
                                SQL Multi Condition CTE Recursion

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Diamond schema: how (de)normalized is that?

Tags:

sql

database

schema

normalization

Serge Balyuk

People also ask

1 Answers

Nicholas Carey

Recent Activity

Donate For Us