Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database design with dynamic fields: single table vs many tables - many indexes

I have to choose the structure of a database that will store content types (eg. Blog articles, Pages, Documents, Invoices, Estimates, etc..) with dynamic fields: for example, the Estimate content type should have the fields title, date and total price.

However in the time those fields can be added ore removed, so after 1 year the Estimate contant type can have the notes field.

This is a common task provided by famous CMS (drupal for example), but I wonder what is the best approach to have best performance and flexibility: Drupal for example use to have one table with basic fields (e.g. title), and all the secondary fields are stored in sub-tables created on-the-fly and linked to the main one with foreign keys:

table node
| id | title         | ...
|  1 | First example |
table fields_node_total_price
| id | node_id | value  |
|  1 | 1       | 123.45 |
table fields_node_date
| id | node_id | value    |
|  1 | 1       | 12345677 |

etc..

My point of view is that this approach is very flexible but easly fall into performance issue: in order to get all fields for a document, you must join the tables many times, and the code itself have to iterate many times to build the query (but this shouldnt be a problem).

Btw multi-table is the most-used approach.. so must have many cons.

Im thinking in what kind of disvantages will using a single table have:

| id | title | total_price | date | ec...

I did some tests with 5 and 50 additional fields; the performance between the single table approach and the multi table approach are enourmous: single table is about 50x time faster.

Every time a field is added, a column is added to the table.. what kind of problems will this approach rise?

EDIT

Let me provide few details:

  1. The application is still in design phase, is a complete redesign of an old application where the fields numbers were static
  2. We did few tests simulating a object to store, both with single table approach and multi table approach (using 50 fields), results are:

Time in seconds:

Test                                                            1°          2°          3°          4°          5°          avg
1000 insert single_table                                        8,5687      8,6832      8,7143      8,7977      8,6906      8,69090137389466
1000 select single table LIKE '%key%' on char(250) field        1,5539      1,5540      1,5591      1,5602      1,5564      1,556705142
1000 select single table LIKE '%key%' on char(25) field         0,8848      0,8923      0,8894      0,8919      0,8888      0,889427996
1000 select single table id = $n                                0,2645      0,2620      0,2645      0,2632      0,2636      0,263564462
1000 select single table integer field < $j                     0,8627      0,8759      0,8673      0,8713      0,8767      0,870787334
1000 insert multi_table                                         446,3830    445,2843    440,8151    436,6051    446,0302    443,023531816
1000 select multi table LIKE '%key%' on char(250) field         1,7048      1,6822      1,6817      1,7041      1,6840      1,691367196
1000 select multi table LIKE '%key%' on char(25) field          0,9391      0,9365      0,9382      0,9431      0,9408      0,939536426
1000 select multi table id = $n                                 0,9336      0,9287      0,9349      0,9331      0,9428      0,93460784
1000 select multi table integer field < $j                      2,3366      2,3260      2,3134      2,3342      2,3228      2,326600456
like image 797
Strae Avatar asked Feb 21 '23 21:02

Strae


1 Answers

It may be worthwhile investigating what is possible with NoSQL databases. I haven't used them much myself, but given you say you need to "...store content types (eg. Blog articles, Pages, Documents, Invoices, Estimates, etc..) with dynamic fields" it seems as though it may be a reasonable approach.

From the Wikipedia article;

...These data stores may not require fixed table schemas, usually avoid join operations, and typically scale horizontally.

and

Often, NoSQL databases are categorized according to the way they store the data and it falls under categories such as Key-Value stores, BigTable Implementations, Document-Store databases and Graph Database.

I'm not saying it is the answer to all your problems, but I'd certainly say it's worth a look.

With regards to other approaches, I've used Entity-Attribute-Value (EAV) in the past, and while the performance probably lags behind having a fixed schema, I feel it is a compromise that had to be made to afford the flexibility in the schema.

My situation is likely to differ from yours, but I'll lay it out for you in case it is any help. We broke the table structure into something that was logical for our situation. There is a bit of a natural hierarchy in that there is a parent table which most of the other table relate to.

Even though we needed dynamic structure due to the variety of the data we are dealing with, there was also some fixed structure. Therefore, for each table requiring dynamic structure, we created a "main" table, and an "attribute" table.

An example of this (SQL Server specific) can be seen below;

CREATE TABLE [dbo].[ParentTbl](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [KnownCol1] [real] NOT NULL,
        -- Lots of other columns ommitted
    [KnownColn] [real] NULL
)        

CREATE TABLE [dbo].[MainTbl](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [ParentId] [int] NOT NULL, -- FK to ParentTbl.Id
    [KnownCol1] [real] NOT NULL,
        -- Lots of other columns ommitted
    [KnownColn] [real] NULL
) 

CREATE TABLE [dbo].[MainTblAttr](
    [Id] [bigint] IDENTITY(1,1) NOT NULL, -- Note big int to cater for LOTS of records
    [MainId] [int] NOT NULL, --FK to MainTbl.Id
    [AttributeColumn] [nvarchar](255) NOT NULL,
    [AttributeValue] [nvarchar](max) NOT NULL
)

You can then perform a PIVOT query to help get your data out. Given you will have different attributes you need to determine which columns to include in the pivot. I found this example to be invaluable when I was developing my solution. However, there are loads of examples on SO. Just search for pivot dynamic columns.

In my instance, having a parent table is a big help in limiting the amount of data I need to trawl through as it limits the child records that I need to look at. This might not be so in your case, but hopefully this will give you some ideas.

Best of luck.

like image 185
Mr Moose Avatar answered Apr 27 '23 08:04

Mr Moose