Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate max columns in Postgresql

Tags:

sql

postgresql

I would like to know what is the correct way to calculate the max number of column in a postgresql table. It says on their website:

Maximum Columns per Table 250 - 1600 depending on column types

So, depending on the column type how do I determine max columns?

like image 278
Luke101 Avatar asked Oct 05 '14 17:10

Luke101


People also ask

How do I find the maximum of a column in PostgreSQL?

The PostgreSQL MAX function returns the maximum value, specified by expression in a set of aggregated rows. This function accepts an expression including any numeric, string, date, or time data type values and returns the maximum as a value of the same data type as specified in the expression .

How many columns is too many for Postgres?

There is a limit on how many columns a table can contain. Depending on the column types, it is between 250 and 1600.


1 Answers

You need to look into the details of physical data storage of PostgreSQL, meanly Page Layout.

  1. As you might know, default PostgreSQL block size is 8kB (8192 bytes). You should also be aware, that in PostgreSQL table rows cannot span block boundary. This already gives you the size limit of 8192 bytes. But…

  2. Looking at the above Page Layout, there's also overhead for the PageHeader, which is 24 bytes on the current PostgreSQL version. So, we're left with 8168 bytes. But…

  3. There's also ItemIdData, which is array of pointers. Let's assume we have only 1 record on this page, therefore this entry occupies only 4 bytes (1 entry). So, we're left with 8164 bytes. But…

  4. Each record also has a RecordHeader, known to occupy 23 bytes. So, we're left with 8141 bytes. But…

  5. There's also a NULL-bitmap right after the RecordHeader, but let's assume we've defined all our columns with NOT NULL constraint. So, same 8141 bytes here. But…

  6. There's such a thing — MAXALIGN. Take a look at this wonderful answer by Erwin. We're speaking of 24+4+23=51 offset here. Now everything will depend on the value of this parameter on your system.

    If it is 32-bit one, then offset will be aligned to 52, meaning we're wasting one more bytes.

    If it is 64-bit one, then offset will be aligned to 54, meaning we're wasting 3 more bytes. Mine system is 64-bit one, so I assume we're left with 8138 bytes.

So this is the space we're left with. And now everything will depend on the types of the columns we've chosen and how they sit together (remember that MAXALIGN thing). Let's take int2 for all columns. Simple calculations shows we should be able to squeeze in 4069 column of this type: all columns NOT NULL and of the same type.

Simple script:

echo "CREATE TABLE tab4069 (" > tab4069.sql
for num in $(seq -f "%04g" 1 4069); do
  echo "    col$num  int2 not null," >> tab4069.sql; done
echo "    PRIMARY KEY (col0001) );" >> tab4069.sql

Still, if you'll try to create this table, you'll hit the error:

ERROR: tables can have at most 1600 columns

A bit of search point to the similar question and, looking into the sources of the PostgreSQL, we get the answer (lines 23 to 47):

/*
 * MaxTupleAttributeNumber limits the number of (user) columns in a tuple.
 * The key limit on this value is that the size of the fixed overhead for
 * a tuple, plus the size of the null-values bitmap (at 1 bit per column),
 * plus MAXALIGN alignment, must fit into t_hoff which is uint8.  On most
 * machines the upper limit without making t_hoff wider would be a little
 * over 1700.  We use round numbers here and for MaxHeapAttributeNumber
 * so that alterations in HeapTupleHeaderData layout won't change the
 * supported max number of columns.
 */
#define MaxTupleAttributeNumber 1664        /* 8 * 208 */

/*
 * MaxHeapAttributeNumber limits the number of (user) columns in a table.
 * This should be somewhat less than MaxTupleAttributeNumber.  It must be
 * at least one less, else we will fail to do UPDATEs on a maximal-width
 * table (because UPDATE has to form working tuples that include CTID).
 * In practice we want some additional daylight so that we can gracefully
 * support operations that add hidden "resjunk" columns, for example
 * SELECT * FROM wide_table ORDER BY foo, bar, baz.
 * In any case, depending on column data types you will likely be running
 * into the disk-block-based limit on overall tuple size if you have more
 * than a thousand or so columns.  TOAST won't help.
 */
#define MaxHeapAttributeNumber  1600       /* 8 * 200 */

There're lots of variable-length types, and they carry out a fixed overhead of 1 or 4 bytes + some number of bytes in the actual value. This means you'll never know in advance how much space a record will take till you have the actual value. Of course, these values might be stored separately via the TOAST, but typically a bigger ones (round 2kB of total length).

Please, consult official docs on types in order to find out space used for the fixed length types. You can also check the output of pg_column_size() function for any type, especially for a complex ones, like arrays, hstore or jsonb.

You'll have to dig into more details if you want a more complete vision on this topic though.

like image 172
vyegorov Avatar answered Oct 11 '22 14:10

vyegorov