I have to choose the structure of a database that will store content types (eg. Blog articles, Pages, Documents, Invoices, Estimates, etc..) with dynamic fields: for example, the Estimate
content type should have the fields title
, date
and total price
.
However in the time those fields can be added ore removed, so after 1 year the Estimate
contant type can have the notes
field.
This is a common task provided by famous CMS (drupal for example), but I wonder what is the best approach to have best performance and flexibility: Drupal for example use to have one table with basic
fields (e.g. title
), and all the secondary fields are stored in sub-tables created on-the-fly and linked to the main one with foreign keys:
table node
| id | title | ...
| 1 | First example |
table fields_node_total_price
| id | node_id | value |
| 1 | 1 | 123.45 |
table fields_node_date
| id | node_id | value |
| 1 | 1 | 12345677 |
etc..
My point of view is that this approach is very flexible but easly fall into performance issue: in order to get all fields for a document, you must join the tables many times, and the code itself have to iterate many times to build the query (but this shouldnt be a problem).
Btw multi-table is the most-used approach.. so must have many cons.
Im thinking in what kind of disvantages will using a single table have:
| id | title | total_price | date | ec...
I did some tests with 5 and 50 additional fields; the performance between the single table approach and the multi table approach are enourmous: single table is about 50x time faster.
Every time a field is added, a column is added to the table.. what kind of problems will this approach rise?
Let me provide few details:
Time in seconds:
Test 1° 2° 3° 4° 5° avg
1000 insert single_table 8,5687 8,6832 8,7143 8,7977 8,6906 8,69090137389466
1000 select single table LIKE '%key%' on char(250) field 1,5539 1,5540 1,5591 1,5602 1,5564 1,556705142
1000 select single table LIKE '%key%' on char(25) field 0,8848 0,8923 0,8894 0,8919 0,8888 0,889427996
1000 select single table id = $n 0,2645 0,2620 0,2645 0,2632 0,2636 0,263564462
1000 select single table integer field < $j 0,8627 0,8759 0,8673 0,8713 0,8767 0,870787334
1000 insert multi_table 446,3830 445,2843 440,8151 436,6051 446,0302 443,023531816
1000 select multi table LIKE '%key%' on char(250) field 1,7048 1,6822 1,6817 1,7041 1,6840 1,691367196
1000 select multi table LIKE '%key%' on char(25) field 0,9391 0,9365 0,9382 0,9431 0,9408 0,939536426
1000 select multi table id = $n 0,9336 0,9287 0,9349 0,9331 0,9428 0,93460784
1000 select multi table integer field < $j 2,3366 2,3260 2,3134 2,3342 2,3228 2,326600456
It may be worthwhile investigating what is possible with NoSQL databases. I haven't used them much myself, but given you say you need to "...store content types (eg. Blog articles, Pages, Documents, Invoices, Estimates, etc..) with dynamic fields" it seems as though it may be a reasonable approach.
From the Wikipedia article;
...These data stores may not require fixed table schemas, usually avoid join operations, and typically scale horizontally.
and
Often, NoSQL databases are categorized according to the way they store the data and it falls under categories such as Key-Value stores, BigTable Implementations, Document-Store databases and Graph Database.
I'm not saying it is the answer to all your problems, but I'd certainly say it's worth a look.
With regards to other approaches, I've used Entity-Attribute-Value (EAV) in the past, and while the performance probably lags behind having a fixed schema, I feel it is a compromise that had to be made to afford the flexibility in the schema.
My situation is likely to differ from yours, but I'll lay it out for you in case it is any help. We broke the table structure into something that was logical for our situation. There is a bit of a natural hierarchy in that there is a parent table which most of the other table relate to.
Even though we needed dynamic structure due to the variety of the data we are dealing with, there was also some fixed structure. Therefore, for each table requiring dynamic structure, we created a "main" table, and an "attribute" table.
An example of this (SQL Server specific) can be seen below;
CREATE TABLE [dbo].[ParentTbl](
[Id] [int] IDENTITY(1,1) NOT NULL,
[KnownCol1] [real] NOT NULL,
-- Lots of other columns ommitted
[KnownColn] [real] NULL
)
CREATE TABLE [dbo].[MainTbl](
[Id] [int] IDENTITY(1,1) NOT NULL,
[ParentId] [int] NOT NULL, -- FK to ParentTbl.Id
[KnownCol1] [real] NOT NULL,
-- Lots of other columns ommitted
[KnownColn] [real] NULL
)
CREATE TABLE [dbo].[MainTblAttr](
[Id] [bigint] IDENTITY(1,1) NOT NULL, -- Note big int to cater for LOTS of records
[MainId] [int] NOT NULL, --FK to MainTbl.Id
[AttributeColumn] [nvarchar](255) NOT NULL,
[AttributeValue] [nvarchar](max) NOT NULL
)
You can then perform a PIVOT query to help get your data out. Given you will have different attributes you need to determine which columns to include in the pivot. I found this example to be invaluable when I was developing my solution. However, there are loads of examples on SO. Just search for pivot dynamic columns.
In my instance, having a parent table is a big help in limiting the amount of data I need to trawl through as it limits the child records that I need to look at. This might not be so in your case, but hopefully this will give you some ideas.
Best of luck.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With