Generally i like to keep my database as clean and expandable as possible.
However after doing some tests, I realised that whilst this is usually the best way to do it, when dealing with large datasets its a lot slower than what i refer to as the "dirty" approach to the problem.
Basically lets say i have a table of objects. These objects belong to certain people. One object may have one person, whilst others more than 1. My initial thought was as I always do, create an objects table for my objects, a peoples table for my people, and then a object_to_people linker table.
However joining the object and linker table in order to get all objects a person is assigned to, can take up to 3 seconds (that's based on around 400k records, but only 1 link per object). Yes i also set up index's e.c.t. to try and speed things up.
If I instead remove the people and linker table, and put the people in the objects table as columns and use 1/0 to set whether each person is assigned to that object, without joining the two large tables i see a speed of around 0.3 -> 0.7 seconds (varied greatly).
To begin with, we only need 2 people. But I don't want to be too restrictive if i can help it. I know I can use caching and what not to improve the end user timings, but is there any reason this would be considered a really bad idea to use columns rather than link tables?
I have a similar setup.
My join table has 17,000,000 rows. My "person" table has 8,400,000 rows, and my "objects" table has 300,000 rows.
I have queries with multiple joins on my joins table and unions of results that return tens of thousands of rows and they take less than 1 second to run (50-400ms).
I think your first layout could be fine, but you probably need to focus on your indexes and queries.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With