Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does PostgreSQL consider NULL boundaries in range types to be distinct from infinite boundaries?

Tags:

Just to preface, I'm not asking what the difference is between a NULL boundary and an infinite boundary - that's covered in this other question. Rather, I'm asking why PostgreSQL makes a distinction between NULL and infinite boundaries when (as far as I can tell) they function exactly the same.

I started using PostgreSQL's range types recently, and I'm a bit confused by what NULL values in range types are supposed to mean. The documentation says:

The lower bound of a range can be omitted, meaning that all values less than the upper bound are included in the range, e.g., (,3]. Likewise, if the upper bound of the range is omitted, then all values greater than the lower bound are included in the range. If both lower and upper bounds are omitted, all values of the element type are considered to be in the range.

This suggests to me that omitted boundaries in a range (which are the equivalent NULL boundaries specified in a range type's constructor) should be considered infinite. However, PostgreSQL makes a distinction between NULL boundaries and infinite boundaries. The documentation continues:

You can think of these missing values [in a range] as +/-infinity, but they are special range type values and are considered to be beyond any range element type's +/-infinity values.

This is puzzling. "beyond infinity" doesn't make sense, as the entire point of infinite values is that nothing can be greater than +infinity or less than -infinity. That doesn't break "element in range"-type checks, but it does introduce an interesting case for primary keys that I think most people wouldn't expect. Or at least, I didn't expect it.

Suppose we create a basic table whose sole field is a daterange, which is also the PK:

CREATE TABLE public.range_test
(
    id daterange NOT NULL,
    PRIMARY KEY (id)
);

Then we can populate it with the following data with no problem:

INSERT INTO range_test VALUES (daterange('-infinity','2021-05-21','[]'));
INSERT INTO range_test VALUES (daterange(NULL,'2021-05-21','[]'));

Selecting all the data reveals we have these two tuples:

[-infinity,2021-05-22)
(,2021-05-22)

So the two tuples are distinct, or there would have been a primary key violation. But again, NULL boundaries and infinite boundaries work exactly the same when we're dealing with the actual elements that make up the range. For example, there is no date value X such that the results of X <@ [-infinity,2021-05-22) returns a different result than X <@ (,2021-05-22). This makes sense because NULL values can't have a type of date, so they can't even be compared to the range (and PostgreSQL even converted the inclusive boundary on the lower NULL bound in daterange(NULL,'2021-05-21','[]') to an exclusive boundary, (,2021-05-22) to be doubly sure). But why are two ranges that are identical in every practical way considered distinct?

When I was still in school, I remember overhearing some discussion about the difference between "unknown" and "doesn't exist" - two people who were smarter than me were talking about that in the context of why NULL values often cause issues, and that replacing the singular NULL with separate "unknown" and "doesn't exist" values might solve those issues, but the discussion was over my head at the time. Thinking about this weird feature made me think of that discussion. So is the distinction between "unknown" and "doesn't exist" the reason why PostgreSQL treats NULL and +-infinity as distinct? If so, why are ranges the only types that allow for that distinction in PostgreSQL? And if not, why does PostgreSQL treat functionally-equivalent values as distinct?

like image 979
Nick Muise Avatar asked May 19 '21 20:05

Nick Muise


People also ask

Is Postgress null?

In PostgreSQL, NULL means no value. In other words, the NULL column does not have any value. It does not equal 0, empty string, or spaces. The NULL value cannot be tested using any equality operator like “=” “!=

What is Tsrange PostgreSQL?

Range types are data types representing a range of values of some element type (called the range's subtype). For instance, ranges of timestamp might be used to represent the ranges of time that a meeting room is reserved. In this case the data type is tsrange (short for “timestamp range”), and timestamp is the subtype.

What is Tstzrange?

tsrange contains timestamp without time zone. tstzrange contains timestamp with time zone.


1 Answers

Rather, I'm asking why PostgreSQL makes a distinction between NULL and infinite boundaries when (as far as I can tell) they function exactly the same.

But they do not. NULL is a syntax convenience when used as bound of a range, while -infinity / infinity are actual values in the domain of the range. Abstract values meaning lesser / greater that any other value, but values nonetheless (which can be included or excluded).

Also, NULL works for any range type, while most data types don't have special values like -infinity / infinity. Take integer and int4range for example.

For a better understanding, consider the thread in pgsql-general that a_horse provided:

  • https://www.postgresql.org/message-id/flat/OrigoEmail.bf5.ac6ff6ffeb116aec.13fc29939e0%40prod2#c9fabdc670211364636b733a79a04712

This makes sense because NULL values can't have a type of date, so they can't even be compared to the range

Every data type can be NULL, even domains that are explicitly NOT NULL. See:

  • Why does PostgreSQL allow NULLs in domains that prohibit NULL?

That includes date, of course (like Adrian commented):

test=> SELECT NULL::date, pg_typeof(NULL::date);
 date | pg_typeof 
------+-----------
      | date
(1 row)

But trying to discuss NULL as value (when used as bound of a range) is a misleading approach to begin with. It's not a value.

... (and PostgreSQL even converted the inclusive boundary on the lower NULL bound in daterange(NULL,'2021-05-21','[]') to an exclusive boundary, (,2021-05-22) to be doubly sure).

Again, NULL is not treated as value in the domain of the range. It just serves as convenient syntax to say: "unbounded". No more than that.

like image 86
Erwin Brandstetter Avatar answered Sep 20 '22 08:09

Erwin Brandstetter