Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server - Performance/Size Drawbacks of Null Columns

I'm working on a table design that could involve many NULL values in about 10 fields maybe 75% of the time the fields would be unused.

I just generated some fake data (a million records) and could not sense any impact on SQL Server 2005. Size difference was in the KB. Performance - no measurable difference after adding an index to the 3 non-nullable columns.

I know SQL Server 2008 has the sparse columns feature (which I assume is going to be used on the next SharePoint's UserData table). I want my code to work on 2005 though. But lots of NULL values exist in the design of the current SharePoint UserData table. So if its good enough for Microsoft...

Any good articles, links, white papers on the drawbacks or pain points around many NULL values in SQL Server table? Anyone have any experience on what happens as you scale to 10 mil or 100 mil records?

like image 637
BuddyJoe Avatar asked Mar 10 '09 21:03

BuddyJoe


2 Answers

I have never had a problem with the performance on multiple null columns, even on databases in the 100s of gigs size. I imagine you can end up with issues if you are running indexes on these fields and then using null in the query, but I have not seen this as a problem personally. Then again, I have not created database tables where every field except 3 was nullable.

On the other hand, I see an architecture problem when most of the data is null. the general reason is either a) an improperly normalized database or b) an attempt to allow users to stage data in the end table rather than creating separate tables to "build" data prior to committing to the database.

It is up to you to determine the best architecture of your database.

like image 104
Gregory A Beamer Avatar answered Sep 24 '22 07:09

Gregory A Beamer


What I do in this situation, which is very common, is to split the data up into two tables:

  • Required Data
  • Optional Data

For example, I'm currently writing a community website and one of the tables will obviously be a user table. I am recording a large amount of information about users and so I have split the data I collect into two tables:

  • Users
  • UserDetails

The Users table contains basic information that I will need all the time such as Username, Name and Session Information.

The UserDetails table contain extra information which I don't need as often such as Profile Page, Email Address, Password, Website Address, Date of Birth and so on.

This is known as vertical partitioning.

like image 25
GateKiller Avatar answered Sep 24 '22 07:09

GateKiller