Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Drowning in a Sea of Nulls

An application I inherited tracks lab test results performed on material samples. Data is stored in a single table (tblSampleData) with a primary key of SampleID and 235 columns representing potential test results. The problem is that only a few tests are performed per sample, so each row contains over 200 nulls. Actually, there is a second similar table (tblSampleData2) with another 215 primarily null columns and a primary key of SampleID. The two tables have a one-to-one relationship and most SampleIDs have some data in both tables. For every SampleID, however, there are easily 400 null columns!

Is this bad database design? If so, which normal form rule is broken? How can I query this table to identify which groups of columns are typically filled together with data? My goal would be to have, say 45 tables with 10 columns and fewer null values. How can I do this? How do I avoid breaking existing applications?

The tables have about 200,000 sample records so far. Users are asking me to add more columns for more tests, but I'd rather build a new table. Is this wise?

like image 387
DeveloperDan Avatar asked Jul 01 '10 12:07

DeveloperDan


1 Answers

I have seen articles / papers that indicate that simply having NULLs in the database breaks the first normal form.

From what I've gathered from your description of the database, a better design might be as follows:

A Sample table with fields that are always associated with a sample. For example,

Sample
------ 
SampleID 
SampleDate 
SampleSource

Then, a table of test types with one entry for each type of test that can be performed.

TestType
--------
TestTypeID
TestName
MaximumAllowedValue

Finally, have an intermediate table that represents the many-to-many relationship between the two above tables and holds the results for the tests.

TestResult
----------
SampleID
TestTypeID
TestResult

This would eliminate the null values because the TestResult table would only contain entries for the tests that were actually performed on each sample. I once designed a database for an almost identical purpose to what I believe you are doing and this is the approach I took.

like image 96
DCNYAM Avatar answered Sep 20 '22 18:09

DCNYAM