Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nullable vs. non-null varchar data types - which is faster for queries?

Tags:

sql

We generally prefer to have all our varchar/nvarchar columns non-nullable with a empty string ('') as a default value. Someone on the team suggested that nullable is better because:

A query like this:

Select * From MyTable Where MyColumn IS NOT NULL

is faster than this:

Select * From MyTable Where MyColumn == ''

Anyone have any experience to validate whether this is true?

like image 997
Randy Minder Avatar asked Jun 19 '10 15:06

Randy Minder


People also ask

Can NULLs improve your database queries performance?

that NULLs can potentially speed up your research because the index will have fewer rows. you can still index the NULL rows if you add another NOT NULL column to the index or even a constant.

Should varchar be nullable?

The database development standards in our organization state the varchar fields should not allow null values. They should have a default value of an empty string ("").

Is it better to use NULL or empty string?

An empty string is useful when the data comes from multiple resources. NULL is used when some fields are optional, and the data is unknown.

Does empty string take more space than NULL?

Learn MySQL from scratch for Data Science and Analytics In innoDB, NULL occupies less space as compared to empty string. Also, the NULL length is null while length of the empty string is 0. From the above output it is clear that the length of the empty string is 1. The above output means that count is 0 for null value.


2 Answers

On some platforms (and even versions), this is going to depend on how NULLs are indexed.

My basic rule of thumb for NULLs is:

  1. Don't allow NULLs until justified

  2. Don't allow NULLs unless the data can really be unknown

A good example of this is modeling address lines. If you have an AddressLine1 and AddressLine2, what does it mean for the first to have data and the second to be NULL? It seems to me, you either know the address or not, and having partial NULLs in a set of data just asks for trouble when somebody concatenates them and gets NULL (ANSI behavior). You might solve this with allowing NULLs and adding a check constraint - either all the Address information is NULL or none is.

Similar thing with middle initial/name. Some people don't have one. Is this different from it being unknown and do you care?

ALso, date of death - what does NULL mean? Not dead? Unknown date of death? Many times a single column is not sufficient to encode knowledge in a domain.

So to me, whether to allow NULLs would depend very much on the semantics of the data first - performance is going to be second, because having data misinterpreted (potentially by many different people) is usually a far more expensive problem than performance.

It might seem like a little thing (in SQL Server the implementation is a bitmask stored with the row), but only allowing NULLs after justification seems to me to work best. It catches things early in development, forces you to address assumptions and understand your problem domain.

like image 91
Cade Roux Avatar answered Oct 14 '22 08:10

Cade Roux


If you want to know that there is no value, use NULL.

As for speed, IS NULL should be faster, because it doesn't use string comparison.

like image 37
Mewp Avatar answered Oct 14 '22 08:10

Mewp