Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database schema design for large number of columns

I have a use case where I need to model reference data for e.g. different flavors of ice cream. Say I have 50 flavors of ice cream :-

  • 20 attributes e.g. freezing-temp, creaminess will be shared across all flavors
  • every flavor of ice cream would have 20-30 attributes that will not be shared with other flavors e.g. :-
    • Strawberry ice cream might track tartness, fruit percentage etc.
    • Chocolate ice cream might track bitterness, cocoa level etc.

How would I model this data neatly in a database model, purely from a storage / retrieval point of view?

The options I can think of :-

  1. One table per flavor. This will need 50 tables, and each table will have 20 columns that will overlap with each other, and another 20-30 attributes that will be unique to the flavor.
    • Pros : models the data of each flavor quite well
    • Cons : column overlap and large number of tables needed
  2. One table for all flavors. This will only need 1 table, but will require 1000+ columns most of which would be empty.
    • Pros : models the data of ice cream in general, quite well
    • Cons : large number of columns and large amount of 'wasted' space
  3. One key-value table for all flavors, with flavor Id, attribute name and attribute value.
    • Pros : simplest to create and insert data
    • Cons : harder to extract, not really a data model per se, difficult to form constraints for attributes, or for attributes related to other attributes
like image 886
Ronbear Avatar asked Nov 10 '22 18:11

Ronbear


1 Answers

Never store a value in the wrong type.

enter image description here

Whatever design you choose, make sure that values are stored in their natural format. Use NUMBER, DATE, VARCHAR2, CLOB, XMLTYPE, CLOB (IS JSON), TIMESTAMP, etc. Trying to cram everything in a string will cause many problems. You lose validation, convenience, performance, and type safety.

For example, here is a common type safety problem. Imagine this simple query to find ice cream that is more than 25% fruit:

select *
from   ice_cream_flavor_attribute
where  attribute_name = 'Fruit Percentage'
   and attribute_value > 25;

Do you see the bug? Do you see how the same query, with the same data, may work one day and fail the next with ORA-01722: invalid number?

It's difficult to write a query that forces Oracle to evaluate conditions in a specific order. Re-ordering the predicates won't help (99.9% of the time). Adding an inline view won't help (99.9% of the time). Using a CASE statement will work but not 100% of the time. Using hints will work but is tricky. Using an inline view and a ROWNUM is my preferred way of solving the problem but it looks odd and is difficult to understand.


If you must use an Entity Attribute Value model (and if you have more than 1000 attributes it may be unavoidable), at least use the right types.

Don't worry about space - a null column uses at most 1 byte.

Don't worry about complaints like "but then our queries are more complicated, we always need to know which column to use!" - realistically there is almost nothing useful you can do with a value without knowing its type. Every time you read or write a value you must already be thinking about the type.

like image 86
Jon Heller Avatar answered Nov 15 '22 05:11

Jon Heller