Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database design question

I have a form where users submit different fields to create events. The number and type of fields requested are different on each form, depending on the category of event. What is the best way of going about designing this database - should the events contain every possible field and simply null the unused fields? Thanks!

like image 493
chalpert Avatar asked Sep 09 '10 18:09

chalpert


1 Answers

If you begin to consider Joel's advice, please go to here.

or here

or here

And if you don't believe any of them, build the 4 tables he mentions. There's only 4, doesn't take long. Then load some data into them... then try to write the queries you want to write...

Changing Column meaning:

This can really screw with cardinality estimates. You dinner plates might be in the 4 - 20 range, the concert seats between 1000 - 2000. Some cardinality calculations look at the spread from min to max and assume and equal distribution (when lacking other statistics)...

From 4 to 2000 means that anywhere GENERIC_COLUMN = n, the % of rows you'll hit is 1/1996th of the total... but really, if you said where EVNT_TYPE = Dinner and GENERIC_COLUMN = n it would REALLY be between 4 and 20, or 1/16th of the total rows... so a huge swing in the card estimate. (This can be fixed with histograms, but the point of showing the automation issues is just to hightlight that if it's an issue to a machine, it's probably not as clean as it could be.)

So if you were to do this (MUCH BETTER than an EAV but...)

I would recommend creating a view for each object.

Table EVENT ( common fields, Generic_Count) View DINNER ( common fields, Generic_Count as Plates) WHERE type = Dinner View CONCERT ( common fields, Generic_Count as Seats) WHERE type = Concert

Then give NO ONE select against EVENT

But this is where you get into trouble by NOT starting with a conceptual data model first.

You'd have an ENTITY for EVENT and another for DINNER which inherits completely from EVENT and another for CONCERT which inherits completely from EVENT. Then you could set a differentiating column in the inheritance object which let's you set the "TYPE" column and then you could even decide how many tables to build with a flick of a switch. 1 table, 2 tables or 3 tables..

At least you can do that in powerDesigner.

Why is DDL considered so 'bad?

The creation of EAV models and questions like this are organized around the idea that DDL is to be avoided. Why ALTER TABLE when you can INSERT a new attribute row? People make poor data model design decisions based on the wrong Utility Function. These functions are things like 'no nullable columns', 'the fewer the tables the better', 'no ddl just to add a new attribute. Insert into Attribute table instead'.

Think of data modeling like this: sculptors will say that the wood or stone already has the figure inside of the block, they are just removing pieces of it to reveal it.

Your problem space already has a data model, it's just your job to discover it... it will have as many tables and columns as it needs. Trying to force it to conform to one of the above utility functions is where things go horribly wrong.

In your case, would you ever like to know all the events you've added in the past 2 weeks? Now think of the possible models. One table per event type would mean summing over n tables to find that answer and with each new event type a new table added and every "All event" query would be changing. You could build a UNION ALL view of those tables but you'd have to remember to add each new table to the view. Debugging through views like that is a pain.

Assuming that you might want a lot of metrics about ALL events, one table makes more sense (At least for some common portion of your event data - Like Event Name, Sponsor ID, Venue ID, event Start Time, event end time, venue available for setup time, etc.) Those field are (let's stipulate) are common to every event.

So now what to do with the other columns? Two options, nullable fields or vertically partition the table. The later is an optimization of the former. And if you read any database optimization books or blogs the major thing I take from them is that premature optimization kills. I see people implementing lots of strategies for problems before they even know if they will have that problem. A coworker had a slow query he wanted me to help with. It was loaded with optimizer hints. I removed them and the SQL screamed... I don't know WHY he hinted it but he was not doing it effectively and I'm pretty sure he never saw an issue so this was all just premature optimization.

Vertical partitioning is something you do when you have large data volumes and you have some frequently accessed data and other data that is not so useful. You can pack a table with a lot fewer blocks if you only pack some of the commons. More rows per block = faster tablescans... doesn't really affect the speed of finding a single row via an index. As you can see vertical partitioning has a specific problem it can solve (others too like row chaining) so if you're sure that's GOING to be an issue then by all means begin that way.

like image 112
Stephanie Page Avatar answered Nov 07 '22 05:11

Stephanie Page