Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database Design For Multiple Product Types with variable attributes

Tags:

database

I have a database containing different product types. Each type contains fields that differ greatly with each other. The first type of product, is classified in three categories. The second type of product, is classified in three categories. But the third and the fourth one, is not classified in anything.

Each product can have any number of different properties.

I am using database model which is basically like below: (see the link) https://www.damirsystems.com/static/img/product_model_01.png

I have a huge database, containing about 500000 products in product table.

So when I am going to fetch a product from database with all its attributes, or going to search product filtering by attributes, it affects performance badly.

Could anyone help me what will be the tables structure in sql or do some more indexing or any feasible solution for this problem. Because different ecommerce sites are using this kind of database and working fine with huge different types of products.


EDIT : The link to the image (on my site) is blocked, so here is the image

enter image description here

like image 519
user2455135 Avatar asked Jun 15 '13 10:06

user2455135


People also ask

What are the three types of database design?

Hierarchical database model. Relational model. Network model. Object-oriented database model.

What are the 3 database design steps?

DBMS (Database Management System) Tutorial Index The methodology is depicted as a bit by bit guide to the three main phases of database design, namely: conceptual, logical, and physical design.

What are the four types of key fields when designing a database?

There are four main types of keys: candidate, primary, foreign, and non-keys.


1 Answers

The model you link to looks like partial entity–attribute–value (EAV) model. EAV is very flexible, but offers poor data integrity, and is cumbersome and usually inefficient. It's not really in the spirit of the relational model. Having worked on some large e-commerce sites, i can tell you that this is not standard or good database design practice in this field.

If you don't have an enormous number of types of product (up to tens, but not hundreds) then you can handle this using one of a two common approaches.

The first approach is simply to have a single table for products, with columns for all the attributes that might be needed in each different kind of product. You use whichever columns are appropriate to each kind of product, and leave the rest null. Say you sell books, music, and video:

create table Product (
    id integer primary key,
    name varchar(255) not null,
    type char(1) not null check (type in ('B', 'M', 'V')),
    number_of_pages integer, -- book only
    duration_in_seconds integer, -- music and video only
    classification varchar(2) check (classification in ('U', 'PG', '12', '15', '18')) -- video only
);

This has the advantage of being simple, and of not requiring joins. However, it doesn't do a good job of enforcing integrity on your data (you could have a book without a number of pages, for example), and if you have more than a few types of products, the table will get highly unwieldy.

You can plaster over the integrity problem with table-level check constraints that require each type of products to have values certain columns, like this:

check ((case when type = 'B' then (number_of_pages is not null) else true end)))

(hat tip to Joe Celko there - i looked up how to do logical implication in SQL, and found an example where he does it with this construction to construct a very similar check constraint!)

You might even say:

check ((case when type = 'B' then (number_of_pages is not null) else (number_of_pages is null) end)))

To ensure that no row has a value in a column not appropriate to its type.

The second approach is to use multiple tables: one base table holding columns common to all products, and one auxiliary table for each type of product holding columns specific to products of that type. So:

create table Product (
    id integer primary key,
    type char(1) not null check (type in ('B', 'M', 'V')),
    name varchar(255) not null
);

create table Book (
    id integer primary key references Product,
    number_of_pages integer not null
);

create table Music (
    id integer primary key references Product,
    duration_in_seconds integer not null
);

create table Video (
    id integer primary key references Product,
    duration_in_seconds integer not null,
    classification varchar(2) not null check (classification in ('U', 'PG', '12', '15', '18'))
);

Note that the auxiliary tables have the same primary key as the main table; their primary key column is also a foreign key to the main table.

This approach is still fairly straightforward, and does a better job of enforcing integrity. Queries will typically involve joins, though:

select
  p.id,
  p.name
from
  Product p
  join Book b on p.id = b.id
where
  b.number_of_pages > 300;

Integrity is still not perfect, because it's possible to create a row in an auxiliary tables corresponding to a row of the wrong type in the main table, or to create rows in multiple auxiliary tables corresponding to a single row in the main table. You can fix that by refining the model further. If you make the primary key a composite key which includes the type column, then the type of a product is embedded in its primary key (a book would have a primary key like ('B', 1001)). You would need to introduce the type column into the auxiliary tables so that they could have foreign keys pointing to the main table, and that point you could add a check constraint in each auxiliary table that requires the type to be correct. Like this:

create table Product (
    type char(1) not null check (type in ('B', 'M', 'V')),
    id integer not null,
    name varchar(255) not null,
    primary key (type, id)
);

create table Book (
    type char(1) not null check (type = 'B'),
    id integer not null,
    number_of_pages integer not null,
    primary key (type, id),
    foreign key (type, id) references Product
);

This also makes it easier to query the right tables given only a primary key - you can immediately tell what kind of product it refers to without having to query the main table first.

There is still a potential problem of duplication of columns - as in the schema above, where the duration column is duplicated in two tables. You can fix that by introducing intermediate auxiliary tables for the shared columns:

create table Media (
    type char(1) not null check (type in ('M', 'V')),
    id integer not null,
    duration_in_seconds integer not null,
    primary key (type, id),
    foreign key (type, id) references Product
);

create table Music (
    type char(1) not null check (type = 'M'),
    id integer not null,
    primary key (type, id),
    foreign key (type, id) references Product
);

create table Video (
    type char(1) not null check (type = 'V'),
    id integer not null,
    classification varchar(2) not null check (classification in ('U', 'PG', '12', '15', '18')),
    primary key (type, id),
    foreign key (type, id) references Product
);

You might not think that was worth the extra effort. However, what you might consider doing is mixing the two approaches (single table and auxiliary table) to deal with situations like this, and having a shared table for some similar kinds of products:

create table Media (
    type char(1) not null check (type in ('M', 'V')),
    id integer not null,
    duration_in_seconds integer not null,
    classification varchar(2) check (classification in ('U', 'PG', '12', '15', '18')),
    primary key (type, id),
    foreign key (type, id) references Product,
    check ((case when type = 'V' then (classification is not null) else (classification is null) end)))
);

That would be particularly useful if there were similar kinds of products that were lumped together in the application. In this example, if your shopfront presents audio and video together, but separately to books, then this structure could support much more efficient retrieval than having separate auxiliary tables for each kind of media.

All of these approaches share a loophole: it's still possible to create rows in the main table without corresponding rows in any auxiliary table. To fix this, you need a second set of foreign key constraints, this time from the main table to the auxiliary tables. This is particular fun for couple of reasons: you want exactly one of the possible foreign key relationships to be enforced at once, and the relationship creates a circular dependency between rows in the two tables. You can achieve the former using some conditionals in check constraints, and the latter using deferrable constraints. The auxiliary tables can be the same as above, but the main table needs to grow what i will tentatively call 'type flag' columns:

create table Product (
    type char(1) not null check (type in ('B', 'M', 'V')),
    id integer not null,

    is_book char(1) null check (is_book is not distinct from (case type when 'B' then type else null end)),
    is_music char(1) null check (is_music is not distinct from (case type when 'M' then type else null end)),
    is_video char(1) null check (is_video is not distinct from (case type when 'V' then type else null end)),

    name varchar(255) not null,
    primary key (type, id)
);

The type flag columns are essentially repetitions of the type column, one for each potential type, which are set if and only if the product is of that type (as enforced by those check constraints). These are real columns, so values will have to be supplied for them when inserting rows, even though the values are completely predictable; this is a bit ugly, but hopefully not a showstopper.

With those in place, then after all the tables are created, you can form foreign keys using the type flags instead of the type, pointing to specific auxiliary tables:

alter table Product add foreign key (is_book, id) references Book deferrable initially deferred;
alter table Product add foreign key (is_music, id) references Music deferrable initially deferred;
alter table Product add foreign key (is_video, id) references Video deferrable initially deferred;

Crucially, for a foreign key relationship to be enforced, all its constituent columns must be non-null. Therefore, for any given row, because only one type flag is non-null, only one relationship will be enforced. Because these constraints are deferrable, it is possible to insert a row into the main table before the required row in the auxiliary table exists. As long as it is inserted before the transaction is committed, it's all above board.

like image 147
Tom Anderson Avatar answered Sep 19 '22 16:09

Tom Anderson