Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can pg_column_size be smaller than octet_length?

Tags:

postgresql

I'm looking for getting anticipated table size by referring column type and length size. I'm trying to use pg_column_size for this.

When testing the function, I realized something seems wrong with this function.

The result value from pg_column_size(...) is sometimes even smaller than the return value from octet_length(...) on the same string.

There is nothing but numeric characters in the column.

postgres=# \d+ t5
                           Table "public.t5"
 Column |       Type        | Modifiers | Storage  | Stats target | Description 
--------+-------------------+-----------+----------+--------------+-------------
 c1     | character varying |           | extended |              | 
Has OIDs: no

postgres=# select pg_column_size(c1), octet_length(c1) as octet from t5;
 pg_column_size | octet 
----------------+-------
              2 |     1
            704 |   700
            101 |  7000
            903 | 77000
(4 rows)

Is this the bug or something? Is there someone with the some formula to calculate anticipated table size from column types and length values of it?

like image 468
KIM Avatar asked Nov 09 '12 08:11

KIM


1 Answers

I'd say pg_column_size is reporting the compressed size of TOASTed values, while octet_length is reporting the uncompressed sizes. I haven't verified this by checking the function source or definitions, but it'd make sense, especially as strings of numbers will compress quite well. You're using EXTENDED storage so the values are eligible for TOAST compression. See the TOAST documentation.

As for calculating expected DB size, that's whole new question. As you can see from the following demo, it depends on things like how compressible your strings are.

Here's a demonstration showing how octet_length can be bigger than pg_column_size, demonstrating where TOAST kicks in. First, let's get the results on query output where no TOAST comes into play:

regress=> SELECT octet_length(repeat('1234567890',(2^n)::integer)), pg_column_size(repeat('1234567890',(2^n)::integer)) FROM generate_series(0,12) n;
 octet_length | pg_column_size 
--------------+----------------
           10 |             14
           20 |             24
           40 |             44
           80 |             84
          160 |            164
          320 |            324
          640 |            644
         1280 |           1284
         2560 |           2564
         5120 |           5124
        10240 |          10244
        20480 |          20484
        40960 |          40964
(13 rows)

Now let's store that same query output into a table and get the size of the stored rows:

regress=> CREATE TABLE blah AS SELECT repeat('1234567890',(2^n)::integer) AS data FROM generate_series(0,12) n;
SELECT 13

regress=> SELECT octet_length(data), pg_column_size(data) FROM blah;
 octet_length | pg_column_size 
--------------+----------------
           10 |             11
           20 |             21
           40 |             41
           80 |             81
          160 |            164
          320 |            324
          640 |            644
         1280 |           1284
         2560 |             51
         5120 |             79
        10240 |            138
        20480 |            254
        40960 |            488
(13 rows)
like image 108
Craig Ringer Avatar answered Sep 22 '22 15:09

Craig Ringer