I would like to know what is the correct way to calculate the max number of column in a postgresql table. It says on their website: <blockquote> Maximum Columns per Table 250 - 1600 depending on column types </blockquote> So, depending on the column type how do I determine max columns?

You need to look into the details of physical data storage of PostgreSQL, meanly <code>Page Layout</code>. <ol> <li>As you might know, default PostgreSQL block size is 8kB (8192 bytes). You should also be aware, that in PostgreSQL table rows cannot span block boundary. This already gives you the size limit of 8192 bytes. But…</li> <li>Looking at the above Page Layout, there's also overhead for the <code>PageHeader</code>, which is 24 bytes on the current PostgreSQL version. So, we're left with 8168 bytes. But…</li> <li>There's also <code>ItemIdData</code>, which is array of pointers. Let's assume we have only 1 record on this page, therefore this entry occupies only 4 bytes (1 entry). So, we're left with 8164 bytes. But…</li> <li>Each record also has a <code>RecordHeader</code>, known to occupy 23 bytes. So, we're left with 8141 bytes. But…</li> <li>There's also a <code>NULL</code>-bitmap right after the <code>RecordHeader</code>, but let's assume we've defined all our columns with <code>NOT NULL</code> constraint. So, same 8141 bytes here. But…</li> <li> There's such a thing — <code>MAXALIGN</code>. Take a look at this wonderful answer by Erwin. We're speaking of <code>24+4+23=51</code> offset here. Now everything will depend on the value of this parameter on your system. If it is 32-bit one, then offset will be aligned to 52, meaning we're wasting one more bytes. If it is 64-bit one, then offset will be aligned to 54, meaning we're wasting 3 more bytes. Mine system is 64-bit one, so I assume we're left with 8138 bytes. </li> </ol> So this is the space we're left with. And now everything will depend on the types of the columns we've chosen and how they sit together (remember that <code>MAXALIGN</code> thing). Let's take <code>int2</code> for all columns. Simple calculations shows we should be able to squeeze in 4069 column of this type: all columns <code>NOT NULL</code> and of the same type. Simple script: <pre class="prettyprint lang-sh prettyprint-override"><code>echo "CREATE TABLE tab4069 (" > tab4069.sql for num in $(seq -f "%04g" 1 4069); do echo " col$num int2 not null," >> tab4069.sql; done echo " PRIMARY KEY (col0001) );" >> tab4069.sql </code></pre> Still, if you'll try to create this table, you'll hit the error: <blockquote> ERROR: tables can have at most 1600 columns </blockquote> A bit of search point to the similar question and, looking into the sources of the PostgreSQL, we get the answer (lines 23 to 47): <pre class="prettyprint"><code>/* * MaxTupleAttributeNumber limits the number of (user) columns in a tuple. * The key limit on this value is that the size of the fixed overhead for * a tuple, plus the size of the null-values bitmap (at 1 bit per column), * plus MAXALIGN alignment, must fit into t_hoff which is uint8. On most * machines the upper limit without making t_hoff wider would be a little * over 1700. We use round numbers here and for MaxHeapAttributeNumber * so that alterations in HeapTupleHeaderData layout won't change the * supported max number of columns. */ #define MaxTupleAttributeNumber 1664 /* 8 * 208 */ /* * MaxHeapAttributeNumber limits the number of (user) columns in a table. * This should be somewhat less than MaxTupleAttributeNumber. It must be * at least one less, else we will fail to do UPDATEs on a maximal-width * table (because UPDATE has to form working tuples that include CTID). * In practice we want some additional daylight so that we can gracefully * support operations that add hidden "resjunk" columns, for example * SELECT * FROM wide_table ORDER BY foo, bar, baz. * In any case, depending on column data types you will likely be running * into the disk-block-based limit on overall tuple size if you have more * than a thousand or so columns. TOAST won't help. */ #define MaxHeapAttributeNumber 1600 /* 8 * 200 */ </code></pre> There're lots of variable-length types, and they carry out a fixed overhead of 1 or 4 bytes + some number of bytes in the actual value. This means you'll never know in advance how much space a record will take till you have the actual value. Of course, these values might be stored separately via the TOAST, but typically a bigger ones (round 2kB of total length). Please, consult official docs on types in order to find out space used for the fixed length types. You can also check the output of <code>pg_column_size()</code> function for any type, especially for a complex ones, like arrays, <code>hstore</code> or <code>jsonb</code>. You'll have to dig into more details if you want a more complete vision on this topic though.

How to calculate max columns in Postgresql

1 Answers

You need to look into the details of physical data storage of PostgreSQL, meanly Page Layout.

As you might know, default PostgreSQL block size is 8kB (8192 bytes). You should also be aware, that in PostgreSQL table rows cannot span block boundary. This already gives you the size limit of 8192 bytes. But…
Looking at the above Page Layout, there's also overhead for the PageHeader, which is 24 bytes on the current PostgreSQL version. So, we're left with 8168 bytes. But…
There's also ItemIdData, which is array of pointers. Let's assume we have only 1 record on this page, therefore this entry occupies only 4 bytes (1 entry). So, we're left with 8164 bytes. But…
Each record also has a RecordHeader, known to occupy 23 bytes. So, we're left with 8141 bytes. But…
There's also a NULL-bitmap right after the RecordHeader, but let's assume we've defined all our columns with NOT NULL constraint. So, same 8141 bytes here. But…
There's such a thing — MAXALIGN. Take a look at this wonderful answer by Erwin. We're speaking of 24+4+23=51 offset here. Now everything will depend on the value of this parameter on your system.

If it is 32-bit one, then offset will be aligned to 52, meaning we're wasting one more bytes.

If it is 64-bit one, then offset will be aligned to 54, meaning we're wasting 3 more bytes. Mine system is 64-bit one, so I assume we're left with 8138 bytes.

So this is the space we're left with. And now everything will depend on the types of the columns we've chosen and how they sit together (remember that MAXALIGN thing). Let's take int2 for all columns. Simple calculations shows we should be able to squeeze in 4069 column of this type: all columns NOT NULL and of the same type.

Simple script:

echo "CREATE TABLE tab4069 (" > tab4069.sql
for num in $(seq -f "%04g" 1 4069); do
  echo "    col$num  int2 not null," >> tab4069.sql; done
echo "    PRIMARY KEY (col0001) );" >> tab4069.sql

Still, if you'll try to create this table, you'll hit the error:

ERROR: tables can have at most 1600 columns

A bit of search point to the similar question and, looking into the sources of the PostgreSQL, we get the answer (lines 23 to 47):

/*
 * MaxTupleAttributeNumber limits the number of (user) columns in a tuple.
 * The key limit on this value is that the size of the fixed overhead for
 * a tuple, plus the size of the null-values bitmap (at 1 bit per column),
 * plus MAXALIGN alignment, must fit into t_hoff which is uint8.  On most
 * machines the upper limit without making t_hoff wider would be a little
 * over 1700.  We use round numbers here and for MaxHeapAttributeNumber
 * so that alterations in HeapTupleHeaderData layout won't change the
 * supported max number of columns.
 */
#define MaxTupleAttributeNumber 1664        /* 8 * 208 */

/*
 * MaxHeapAttributeNumber limits the number of (user) columns in a table.
 * This should be somewhat less than MaxTupleAttributeNumber.  It must be
 * at least one less, else we will fail to do UPDATEs on a maximal-width
 * table (because UPDATE has to form working tuples that include CTID).
 * In practice we want some additional daylight so that we can gracefully
 * support operations that add hidden "resjunk" columns, for example
 * SELECT * FROM wide_table ORDER BY foo, bar, baz.
 * In any case, depending on column data types you will likely be running
 * into the disk-block-based limit on overall tuple size if you have more
 * than a thousand or so columns.  TOAST won't help.
 */
#define MaxHeapAttributeNumber  1600       /* 8 * 200 */

There're lots of variable-length types, and they carry out a fixed overhead of 1 or 4 bytes + some number of bytes in the actual value. This means you'll never know in advance how much space a record will take till you have the actual value. Of course, these values might be stored separately via the TOAST, but typically a bigger ones (round 2kB of total length).

Please, consult official docs on types in order to find out space used for the fixed length types. You can also check the output of pg_column_size() function for any type, especially for a complex ones, like arrays, hstore or jsonb.

You'll have to dig into more details if you want a more complete vision on this topic though.

172

answered Oct 11 '22 14:10

vyegorov

Related questions
                            
                                Is there a `connect by` alternative in MySQL?
                            
                                Get total count of rows in pagination query
                            
                                Cascading deletes like ON DELETE CASCADE for a one time operation in MySQL
                            
                                MySQL Duplicate column error only when query wrapped as subquery
                            
                                JDBC - select where column is NULL
                            
                                Couldn't read row 0, col -1 from CursorWindow?
                            
                                InnoDB mySQL unable to set "ON DELETE SET DEFAULT'. How to set?
                            
                                Alter Column Type from Char to Varchar in sql server
                            
                                DELETE FROM <subquery>
                            
                                Append table to an existing one: SQL Server
                            
                                Server Side processing with Datatables v1.10.0
                            
                                MySQL - is it safe to check the 6 first characters of a query to be sure it is a SELECT?
                            
                                What happens when you store a value in a VARCHAR, which is over the limit in SQL? [closed]
                            
                                Django filter multiple columns with a list of tuples
                            
                                LISTAGG Query "ORA-00937: not a single-group group function" [duplicate]
                            
                                Using an aggregate function in where
                            
                                TSQL After Update Trigger check for update on multiple columns in one IF UPDATE
                            
                                How to implement FIFO in sql
                            
                                How to apply md5 function to field in django orm?
                            
                                How can I compare only hour and minute in sql

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to calculate max columns in Postgresql

Tags:

sql

postgresql

Luke101

People also ask

1 Answers

vyegorov

Recent Activity

Donate For Us