I'm looking for getting anticipated table size by referring column type and length size. I'm trying to use pg_column_size
for this.
When testing the function, I realized something seems wrong with this function.
The result value from pg_column_size(...)
is sometimes even smaller than the return value from octet_length(...)
on the same string.
There is nothing but numeric characters in the column.
postgres=# \d+ t5
Table "public.t5"
Column | Type | Modifiers | Storage | Stats target | Description
--------+-------------------+-----------+----------+--------------+-------------
c1 | character varying | | extended | |
Has OIDs: no
postgres=# select pg_column_size(c1), octet_length(c1) as octet from t5;
pg_column_size | octet
----------------+-------
2 | 1
704 | 700
101 | 7000
903 | 77000
(4 rows)
Is this the bug or something? Is there someone with the some formula to calculate anticipated table size from column types and length values of it?
I'd say pg_column_size
is reporting the compressed size of TOAST
ed values, while octet_length
is reporting the uncompressed sizes. I haven't verified this by checking the function source or definitions, but it'd make sense, especially as strings of numbers will compress quite well. You're using EXTENDED
storage so the values are eligible for TOAST
compression. See the TOAST
documentation.
As for calculating expected DB size, that's whole new question. As you can see from the following demo, it depends on things like how compressible your strings are.
Here's a demonstration showing how octet_length
can be bigger than pg_column_size
, demonstrating where TOAST kicks in. First, let's get the results on query output where no TOAST
comes into play:
regress=> SELECT octet_length(repeat('1234567890',(2^n)::integer)), pg_column_size(repeat('1234567890',(2^n)::integer)) FROM generate_series(0,12) n;
octet_length | pg_column_size
--------------+----------------
10 | 14
20 | 24
40 | 44
80 | 84
160 | 164
320 | 324
640 | 644
1280 | 1284
2560 | 2564
5120 | 5124
10240 | 10244
20480 | 20484
40960 | 40964
(13 rows)
Now let's store that same query output into a table and get the size of the stored rows:
regress=> CREATE TABLE blah AS SELECT repeat('1234567890',(2^n)::integer) AS data FROM generate_series(0,12) n;
SELECT 13
regress=> SELECT octet_length(data), pg_column_size(data) FROM blah;
octet_length | pg_column_size
--------------+----------------
10 | 11
20 | 21
40 | 41
80 | 81
160 | 164
320 | 324
640 | 644
1280 | 1284
2560 | 51
5120 | 79
10240 | 138
20480 | 254
40960 | 488
(13 rows)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With