Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are nulls in a relational database okay? [closed]

People also ask

Why should nulls in a relation be avoided?

They should be avoided to avoid the complexity in select & update queries and also because columns which have constraints like primary or foreign key constraints cannot contain a NULL value.

Are nulls allowed in SQL?

By default, a column can hold NULL values. The NOT NULL constraint enforces a column to NOT accept NULL values. This enforces a field to always contain a value, which means that you cannot insert a new record, or update a record without adding a value to this field.

Are NULL values supported in relational model?

In terms of the relational database model, a NULL value indicates an unknown value. If we widen this theoretical explanation, the NULL value points to an unknown value but this unknown value does not equivalent to a zero value or a field that contains spaces.

Should I allow nulls SQL Server?

In SQL, you should, in most circumstances, specify explicitly whether a column should or shouldn't allow NULL values. It isn't a good idea to rely on defaults, and assume that, if don't specify the nullability of a column explicitly, using NULL or NOT NULL , then the column should be nullable.


Nulls are negatively viewed from the perspective of database normalization. The idea being that if a value can be nothing, then you really should split that out into another sparse table such that you don't require rows for items which have no value.

It's an effort to make sure all data is valid and valued.

In some cases having a null field is useful, though, especially when you want to avoid yet another join for performance reasons (although this shouldn't be an issue if the database engine is setup properly, except in extraordinary high performance scenarios.)

-Adam


One argument against nulls is that they don't have a well-defined interpretation. If a field is null, that could be interpreted as any of the following:

  • The value is "Nothing" or "Empty set"
  • There is no value that makes sense for that field.
  • The value is unknown.
  • The value hasn't been entered yet.
  • The value is an empty string (for databases that don't distinguish between nulls and empty strings).
  • Some application-specific meaning (e.g., "If the value is null, then use a default value.")
  • An error has occurred, causing the field to have a null value when it really shouldn't.

Some schema designers demand that all values and data types should have well-defined interpretations, therefore nulls are bad.


It depends.

As long as you understand why you are allowing NULLs in the database (the choice needs to be made on a per-column basis) AND how you will interpret, ignore or otherwise deal with them, they are fine.

For instance, a column like NUM_CHILDREN - what do you do if you don't know the answer - it should be NULL. In my mind, there is no other best option for this column's design (even if you have a flag to determine whether the NUM_CHILDREN column is valid, you still have to have a value in this column).

On the other hand, if you don't allow NULLs and have special reserved values for certain cases (instead of flags), like -1 for number of children when it is really unknown, you have to address these in a similar way, in terms of conventions, documentation, etc.

So, ultimately, the issues have to be addressed with conventions, documentation and consistency.

The alternative, as apparently espoused by Adam Davis in the above answer, of normalizing the columns out to sparse (or not so sparse, in the case of the NUM_CHILDREN example or any example where most of the data has known values) tables, while able to eliminate all NULLs, is non-workable in general practice.

In many cases where an attribute is unknown, it makes little sense to join to another table for each and every column which could allow NULLs in a simpler design. The overhead of joins, the space requirements for theprimary keys make little sense in the real world.

This brings to mind the way duplicate rows can be eliminated by adding a cardinality column, while it theoretically solves the problem of not having a unique key, in practice that is sometimes impossible - for instance, in large scale data. The purists are then quick to suggest a surrogate PK instead, yet the idea that a meaningless surrogate can form part of a tuple (row) in a relation (table) is laughable from the point of view of the relational theory.


Null markers are fine. Really, they are.


There are several different objections to the use of NULL. Some of the objections are based on database theory. In theory, there is no difference between theory and practice. In practice, there is.

It is true that a fully normalized database can get along without NULLS at all. Any place where a data value has to be left out is a place where an entire row can be left out with no loss of information.

In practice, decomposing tables to this extent serves no great useful purpose, and the programming needed to perform simple CRUD operations on the database become more tedious and error prone, rather than less.

There are places where the use of NULLS can cause problems: essentially these revolve around the following question: what does missing data really mean? All a NULL really conveys is that there is no value stored in a given field. But the inferences application programmers draw from missing data are sometimes incorrect, and that causes a lot of problems.

Data can be missing from a location for a variety of reasons. Here are a few:

  1. The data is inapplicable in this context. e.g. spouse's first name for a single person.

  2. The user of a data entry form left a field blank, and the application does not require an entry in the field.

  3. The data is copied to the database from some other database or file, and there was missing data in the source.

  4. There is an optional relationship encoded in a foreign key.

  5. An empty string was stored in an Oracle database.

Here are some guidelines about when to avoid NULLS:

If in the course of normal expected programming, query writers have to write a lot of ISNULL, NV, COALESCE, or similar code in order to substitute a valid value for the NULL. Sometimes, it's better to make the substitution at store time, provided what's being stored is "reality".

If counts are likely to be off because rows containing a NULL were counted. Often, this can be obviated by just selecting count(MyField) instead of count(*).

Here is one place where you by golly better get used to NULLS, and program accordingly: whenever you start using outer joins, like LEFT JOIN and RIGHT JOIN. The whole point behind an outer join as distinct from an inner join is to get rows when some matching data is missing. The missing data will be given as NULLS.

My bottom line: don't dismiss theory without understanding it. But learn when to depart from theory as well as how to follow it.


There is nothing wrong with using NULL for data fields. You have to be careful when setting keys to null. Primary keys should never be NULL. Foreign keys can be null but you have to be careful not to create orphan records.

If something is "non existent" then you should use NULL instead of an empty string or other kind of flag.