Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which of these DB designs is better?

Tags:

sql

database

I'm trying to understand some concepts of DB desing.

I have three tables:

Movies (id,title)
1 - The godfather
2 - Matrix


Attribute (id,name)
1 - Country
2 - Type

Attribute Value(attribute_id,id,value)

1,1,USA
1,2,Japan
2,1,Thriller
2,2,Comedy

and I would like to link movies with one and just one attribute and with one attribute value

IE: Godfather, Country: USA, Type: Crime

I'm trying to find out which of the next is the best solution for linking attributes to a movie. I can see 4 diferent options:

Schema A

Schema A!

Problem I see is that I can't restrict multiple attribute_values of same attribute for a movie. I.E ("godfather","USA","JAPAN") is a valid statement The restriction should be controlled by the application

Schema B

Schema B

It's almost the same as Schema A but making Attribute Value a weak entity. I think this has no effect at database level but it would make a bit harder to fetch attribute values as you need attribute key as well. This schema allows to have same category repated, with diferent values, multiple times, so I don't think is a good option neither. As well as option A, the restriction should be controlled by the application

("godfather","Country:USA","Country:JAPAN") is a valid statement

Schema C

Schema C

I think this is the correct one as now we can’t add more than 1 attribute of the same type to a movie “Godfather”, “USA”, “JAPAN” is not a valid insertion!

But I can't tell if making attribute_value a weak entity would be correct or not, better or worse :S

Schema D

Schema D

As I said, same as C but with composite key in attribute_value. I’m not sure if this break some database normalization rules. In case this is OK, which table should be referenced from movie_attribute_value for field attribute_id? Attribute ID from Attribute table or AttributeID from AttributeValue table? Is ok to have a composite foreing key and use only a part of the key in the PK ?

Could you please explain with option is better and why?

Thanks in advance!

EDIT

I understood the problems of such a design like this, what a EAV schema is and the needed to avoid this type of schema unless in a situation with a lot of changes in the attribute table. Unfortunately this is my scenario, the attributes of the movie are defined by users, so I don't have a way to know which attributes are going to be used. I have to read them and display to another users to fill them. I think Schema C is correct but would like to know what's the problem of using schemas A and B and letting developers to control the restrictions (one same type attribute per movie) in the code

As well would be great if somebody can explaing the benefits and pitfalls of using Schema D (composite K) instead of Schema C and if it's OK to have only some fields of a foreign key (attribute_value_id,attribute_id) as PK (movie_id, attribute_id)

like image 289
Aitor Avatar asked Sep 20 '13 10:09

Aitor


2 Answers

As Marc_s comments, EAV designs have a whole bunch of drawbacks. In the case of a movie collection, you know the schema, and it's unlikely to change randomly, and when it does change (e.g. you need to add a flag "available in 4K"), it's probably a big deal.

Ask yourself how you will retrieve all films for a given genre, or all films available in both the US and Japan, or all comedies available in the US but not Japan - you'll very quickly see the limits of EAV.

To answer your question - none of your designs work for me - there are too many tables that don't earn their keep. If you really must go EAV, I'd suggest:

MOVIES
---------
MovieID
.....


ATTRIBUTES
--------------
AttributeID
AttributeName

MOVIE_ATTRIBUTES
------------
MovieID
AttributeID
Value

If you want to provide the list of valid value, the easiest way is to query the "movie attributes" table and retrieve previous entries for that combination of movie and attribute - keeping your schema simple will make life MUCH easier.

If you really want to put the values in a separate table, schema D appears correct.

Schema C says:

  • for each movie, I have 0 or more movie_attibute_value records
  • for each movie_attribute_value_record, I have 0 or more attribute_value records
  • for each attribute_value record, i have zero or more attribute/value combinations.

I believe the last statement is incorrect.

like image 80
Neville Kuyt Avatar answered Oct 22 '22 17:10

Neville Kuyt


One approach would be to lump all the attributes in the one table, along with a defined type of attribute. Thus:

Movies
------
MovieId


AttributeTypes
---------------
AttributeTypeId
Description


Attributes
---------
AttributeId
AttributeTypeId
Description


MovieAttributes
---------------
MovieId
AttributeId

It could make for awkward queries, but that really depends on how the stored data will be used.

(In other words, yes, I agree with prior posts, and recommend avoiding EAV structures.)

like image 33
Philip Kelley Avatar answered Oct 22 '22 17:10

Philip Kelley