Just to preface, I'm not asking what the difference is between a NULL boundary and an infinite boundary - that's covered in this other question. Rather, I'm asking why PostgreSQL makes a distinction between NULL and infinite boundaries when (as far as I can tell) they function exactly the same.
I started using PostgreSQL's range types recently, and I'm a bit confused by what NULL values in range types are supposed to mean. The documentation says:
The lower bound of a range can be omitted, meaning that all values less than the upper bound are included in the range, e.g.,
(,3]
. Likewise, if the upper bound of the range is omitted, then all values greater than the lower bound are included in the range. If both lower and upper bounds are omitted, all values of the element type are considered to be in the range.
This suggests to me that omitted boundaries in a range (which are the equivalent NULL boundaries specified in a range type's constructor) should be considered infinite. However, PostgreSQL makes a distinction between NULL boundaries and infinite boundaries. The documentation continues:
You can think of these missing values [in a range] as +/-infinity, but they are special range type values and are considered to be beyond any range element type's +/-infinity values.
This is puzzling. "beyond infinity" doesn't make sense, as the entire point of infinite values is that nothing can be greater than +infinity or less than -infinity. That doesn't break "element in range"-type checks, but it does introduce an interesting case for primary keys that I think most people wouldn't expect. Or at least, I didn't expect it.
Suppose we create a basic table whose sole field is a daterange, which is also the PK:
CREATE TABLE public.range_test
(
id daterange NOT NULL,
PRIMARY KEY (id)
);
Then we can populate it with the following data with no problem:
INSERT INTO range_test VALUES (daterange('-infinity','2021-05-21','[]'));
INSERT INTO range_test VALUES (daterange(NULL,'2021-05-21','[]'));
Selecting all the data reveals we have these two tuples:
[-infinity,2021-05-22)
(,2021-05-22)
So the two tuples are distinct, or there would have been a primary key violation. But again, NULL boundaries and infinite boundaries work exactly the same when we're dealing with the actual elements that make up the range. For example, there is no date
value X such that the results of X <@ [-infinity,2021-05-22)
returns a different result than X <@ (,2021-05-22)
. This makes sense because NULL values can't have a type of date
, so they can't even be compared to the range (and PostgreSQL even converted the inclusive boundary on the lower NULL bound in daterange(NULL,'2021-05-21','[]')
to an exclusive boundary, (,2021-05-22)
to be doubly sure). But why are two ranges that are identical in every practical way considered distinct?
When I was still in school, I remember overhearing some discussion about the difference between "unknown" and "doesn't exist" - two people who were smarter than me were talking about that in the context of why NULL values often cause issues, and that replacing the singular NULL with separate "unknown" and "doesn't exist" values might solve those issues, but the discussion was over my head at the time. Thinking about this weird feature made me think of that discussion. So is the distinction between "unknown" and "doesn't exist" the reason why PostgreSQL treats NULL and +-infinity as distinct? If so, why are ranges the only types that allow for that distinction in PostgreSQL? And if not, why does PostgreSQL treat functionally-equivalent values as distinct?
In PostgreSQL, NULL means no value. In other words, the NULL column does not have any value. It does not equal 0, empty string, or spaces. The NULL value cannot be tested using any equality operator like “=” “!=
Range types are data types representing a range of values of some element type (called the range's subtype). For instance, ranges of timestamp might be used to represent the ranges of time that a meeting room is reserved. In this case the data type is tsrange (short for “timestamp range”), and timestamp is the subtype.
tsrange contains timestamp without time zone. tstzrange contains timestamp with time zone.
Rather, I'm asking why PostgreSQL makes a distinction between NULL and infinite boundaries when (as far as I can tell) they function exactly the same.
But they do not. NULL
is a syntax convenience when used as bound of a range, while -infinity
/ infinity
are actual values in the domain of the range. Abstract values meaning lesser / greater that any other value, but values nonetheless (which can be included or excluded).
Also, NULL
works for any range type, while most data types don't have special values like -infinity
/ infinity
. Take integer
and int4range
for example.
For a better understanding, consider the thread in pgsql-general that a_horse provided:
This makes sense because NULL values can't have a type of date, so they can't even be compared to the range
Every data type can be NULL
, even domains that are explicitly NOT NULL
. See:
That includes date
, of course (like Adrian commented):
test=> SELECT NULL::date, pg_typeof(NULL::date);
date | pg_typeof
------+-----------
| date
(1 row)
But trying to discuss NULL
as value (when used as bound of a range) is a misleading approach to begin with. It's not a value.
... (and PostgreSQL even converted the inclusive boundary on the lower NULL bound in
daterange(NULL,'2021-05-21','[]')
to an exclusive boundary,(,2021-05-22)
to be doubly sure).
Again, NULL
is not treated as value in the domain of the range. It just serves as convenient syntax to say: "unbounded". No more than that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With