Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a db table NULL best practices

Not sure on what the best practices are for dealing with NULL values when I have a single table where two fields are only sometimes populated creating a lot of NULL values in the rows.

Should the two fields be moved to a separate table creating two tables with no NULL values?

A join across these two tables would just return a result that equals my original table with the NULL's, so what's the point in that?

Seems pointless to separate them but I have been reading a bit about avoiding null's all together in the db.

Any thoughts welcome.

like image 773
baileyswalk Avatar asked Dec 13 '10 16:12

baileyswalk


Video Answer


2 Answers

  1. Purely theoretically, a NULL is supposed to mean "unknown value". So - again, purely theoretically - you should design your tables when normalized so that you don't need to fill out NULL values to mean "not applicable for this row". However, this point has pretty much no relation to any practical consideration (design, performance, or query readability).

  2. Practically, there are some performance considerations. You should normalize away very sparse data in the following cases:

    • There is material benefit from shortening the table (both IO wise and/or space wise). NULLs do take space, and the wider the rows the worse the performance. This is especially true when the table has a LOT of rows and there are many such sparse columns. For smaller table with only 2 such columns the benefits realized might not be worth the trouble of having an extra join.

    • Your queries have the column in question in the WHERE clause. IIRC, querying on a heavily NULL-ed column is rather inefficient.

    • On the other hand, at certain point, having extra joins in the query might hurt the optimizer performance (at least it does so on Sybase once your joins have 10+ tables - from taking up CPU resources when optimizer runs to actually confusing the optimizer to pick a VERY bad plan). The solution is to avoid having too-many tables due to normalization (as in, don't bother splitting your 2 columns into a separate table), or forcing the query plan. The latter is obviously Bad Juju.

like image 152
DVK Avatar answered Sep 23 '22 08:09

DVK


Nulls cause incorrect and inconsistent results in queries and generally increase code complexity due to the special handling needed in code that has to process them. For these reasons it usually makes sense to avoid or minimise nulls in your database designs. You don't need to use nulls in queries either - although SQL unfortunately makes them very difficult to avoid. However, by not using nulls in base tables you will ensure that your data model more accurately reflects reality and you will give database users more control over how they want nulls to be used.

like image 29
nvogel Avatar answered Sep 21 '22 08:09

nvogel