Database schema design for large number of columns

Question

I have a use case where I need to model reference data for e.g. different flavors of ice cream. Say I have 50 flavors of ice cream :-

20 attributes e.g. freezing-temp, creaminess will be shared across all flavors
every flavor of ice cream would have 20-30 attributes that will not be shared with other flavors e.g. :-
- Strawberry ice cream might track tartness, fruit percentage etc.
- Chocolate ice cream might track bitterness, cocoa level etc.

How would I model this data neatly in a database model, purely from a storage / retrieval point of view?

The options I can think of :-

One table per flavor. This will need 50 tables, and each table will have 20 columns that will overlap with each other, and another 20-30 attributes that will be unique to the flavor.
- Pros : models the data of each flavor quite well
- Cons : column overlap and large number of tables needed
One table for all flavors. This will only need 1 table, but will require 1000+ columns most of which would be empty.
- Pros : models the data of ice cream in general, quite well
- Cons : large number of columns and large amount of 'wasted' space
One key-value table for all flavors, with flavor Id, attribute name and attribute value.
- Pros : simplest to create and insert data
- Cons : harder to extract, not really a data model per se, difficult to form constraints for attributes, or for attributes related to other attributes

Jon Heller · Accepted Answer

Never store a value in the wrong type.

enter image description here

Whatever design you choose, make sure that values are stored in their natural format. Use NUMBER, DATE, VARCHAR2, CLOB, XMLTYPE, CLOB (IS JSON), TIMESTAMP, etc. Trying to cram everything in a string will cause many problems. You lose validation, convenience, performance, and type safety.

For example, here is a common type safety problem. Imagine this simple query to find ice cream that is more than 25% fruit:

select *
from   ice_cream_flavor_attribute
where  attribute_name = 'Fruit Percentage'
   and attribute_value > 25;

Do you see the bug? Do you see how the same query, with the same data, may work one day and fail the next with ORA-01722: invalid number?

It's difficult to write a query that forces Oracle to evaluate conditions in a specific order. Re-ordering the predicates won't help (99.9% of the time). Adding an inline view won't help (99.9% of the time). Using a CASE statement will work but not 100% of the time. Using hints will work but is tricky. Using an inline view and a ROWNUM is my preferred way of solving the problem but it looks odd and is difficult to understand.

If you must use an Entity Attribute Value model (and if you have more than 1000 attributes it may be unavoidable), at least use the right types.

Don't worry about space - a null column uses at most 1 byte.

Don't worry about complaints like "but then our queries are more complicated, we always need to know which column to use!" - realistically there is almost nothing useful you can do with a value without knowing its type. Every time you read or write a value you must already be thinking about the type.

Database schema design for large number of columns

Tags:

oracle

database-design

Ronbear

1 Answers

Jon Heller

Recent Activity

Donate For Us

Database schema design for large number of columns

Tags:

oracle

database-design

Ronbear

1 Answers

Jon Heller

Related questions

Recent Activity

Donate For Us