I've started working on a project where there is a fairly large table (about 82,000,000 rows) that I think is very bloated. One of the fields is defined as:
consistency character varying NOT NULL DEFAULT 'Y'::character varying
It's used as a boolean, the values should always either be ('Y'|'N').
Note: there is no check constraint, etc.
I'm trying to come up with reasons to justify changing this field. Here is what I have:
Here are my question(s).
boolean
, but also 1 byte for a 'Y'
in UTF-8 (at least that's what I get when I check the length in Python). Is there any other storage overhead here that would be saved?=TRUE
" vs. "='Y'
"?PostgreSQL (unlike Oracle) has a fully-fledged boolean
type. Generally, a "yes/no flag" should be boolean
. That's the proper type to use!
Basically, a boolean
column occupies 1 byte on disk,
while text
or character varying
(quoting the manual here) ...
the storage requirement for a short string (up to 126 bytes) is 1 byte plus the actual string
That's 2 bytes for the simple character. So you can cut storage of that column in half.
Actual storage is more complicated than that. There is some fixed overhead per table, page and row, there is special NULL
storage and some types require data alignment. The overall impact will be very limited - if noticeable at all.
More on how to measure actual space requirement.
Encoding UTF8
doesn't make any difference here. Basic ASCII-characters are bit-compatible with other encodings like LATIN-1
.
In your case, according to your description, you should keep the NOT NULL constraint you already seem to have - independent of the base type.
Will be slightly better in any case with boolean. Besides being slightly smaller, the logic for boolean
is simpler and varchar
or text
are also generally burdened with COLLATION specific rules. But don't expect much for something that simple.
Instead of
WHERE consistency = 'Y'
You could write:
WHERE consistency = TRUE
But, really, you can simplify to just:
WHERE consistency
No further evaluation needed.
Transforming your table is simple:
ALTER TABLE tbl ALTER consistency TYPE boolean
USING CASE consistency WHEN 'Y' THEN TRUE ELSE FALSE END;
This CASE
expression folds everything that is not TRUE
('Y') to FALSE
. The NOT NULL constraint just stays.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With