I have a table with 25 columns where 20 columns can have null values for some (30-40%) rows. Now what is the cost of having rows with 20 null columns? Is this OK?
Or
is it a good design to have another table to store those 20 columns and add a ref to the first table? This way I will only write to the second table only when there is are values.
I am using SQL server 2005. Will migrate to 2008 in future.
Only 20 columns are varchar, rest smallint, smalldate
What I am storing: These columns store different attributes of the row it belongs to. These attributes can be null sometimes.
The table will hold ~billion of rows
Please comment.
You should describe the type of data you are storing. It sounds like some of those columns should be moved to another table.
For example, if you have several columns that represent multiple columns for the same type of data, then I would say move it to another table On the other hand, if you need this many columns to describe different types of data, then you may need to keep it as it is.
So it kind of depends on what you are modelling.
Are there some circumstances where some of those columns are required? If so, then perhaps you should use some form of inheritance. For instance, if this were information about patients in a hospital, and there was some data that only made sense for female patients, then you could create a FemalePatients table with those columns. Those columns that must always be collected for female patients could then be declared NOT NULL
in that separate table.
It depends on the data types (40 nullable ints is going to basically take the same space as 40 non-nullable ints, regardless of the values). In SQL Server, the space is fairly efficient with ordinary techniques. In 2008, you do have the SPARSE feature.
If you do split the table vertically with an optional 1:1 relationship, there is a possibility of wrapping the two tables with a view and adding triggers on the view to make it updatable and hide the underlying implementation.
So there are plenty of options, many of which can be implemented after you see the data load and behavior.
Create tables based on the distinct sets of attributes you have. So if you have some data where some of your columns do not apply then it would make sense to have that data in a table which doesn't have those columns. As far as possible, avoid repeating the same attribute in multiple tables. Make sure your data is in at least Boyce-Codd / 5th Normal Form and you won't go far wrong.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With