Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgresql: In-Row vs Out-of-Row for text/varchar

Two part question:

  1. What is Postgresql behavior for storing text/varchars in-row vs out-of-row? Am I correct in thinking that with default settings, all columns will always be stored in-row until the 2kB size is reached?

  2. Do we have any control over the above behavior? Is there any way I can change the threshold for a specific column/table, or force a specific column to always be stored out-of-row?

I've read through PostGresql Toast documentation (http://www.postgresql.org/docs/8.3/static/storage-toast.html), but I don't see any option for changing the thresholds (default seems to be 2kB-for-row) or for forcing a column to always store out-of-row (EXTERNAL only allows it, but doesn't enforce it).

I've found documentation explaining how to do this on SQL Server (https://msdn.microsoft.com/en-us/library/ms173530.aspx), but don't see anything similar for PostGresql.


If anyone's interested in my motivation, I have a table that has a mix of short-consistent columns (IDs, timestamps etc), a column that is varchar(200), and a column that is text/varchar(max), which can be extremely large in length. I currently have both varchars stored in a separate table, just to allow efficient storage/lookups/scanning on the short-consistent columns.

This is a pain however, because I constantly have to do joins to read all the data. I would really like to store all the above fields in the same table, and tell Postgresql to force-store the 2 VARCHARs out-of-row, always.

like image 983
RvPr Avatar asked Jul 30 '15 13:07

RvPr


1 Answers

Edited Answer

For first part of the question: you are correct (see for instance this).

For the second part of the question: the standard way of storing columns is to compress variable length text fields if their size is over 2KB, and eventually store them into a separate area, called “TOAST table”.

You can give a “hint” to the system on how to store a field by using the following command for your columns:

ALTER TABLE YourTable
  ALTER COLUMN YourColumn SET STORAGE (PLAIN | EXTENDED | EXTERNAL | MAIN)

From the manual:

SET STORAGE

This form sets the storage mode for a column. This controls whether this column is held inline or in a secondary TOAST table, and whether the data should be compressed or not. PLAIN must be used for fixed-length values such as integer and is inline, uncompressed. MAIN is for inline, compressible data. EXTERNAL is for external, uncompressed data, and EXTENDED is for external, compressed data. EXTENDED is the default for most data types that support non-PLAIN storage. Use of EXTERNAL will make substring operations on very large text and bytea values run faster, at the penalty of increased storage space. Note that SET STORAGE doesn't itself change anything in the table, it just sets the strategy to be pursued during future table updates. See Section 59.2 for more information.

Since the manual is not completely explicit on this point, this is my interpretation: the final decision about how to store the field is left in any case to the system, given the following constraints:

  1. No field can be stored such that the total size of a row is over 8KB
  2. No field is stored out-of-row if its size is less then the TOAST_TUPLE_THRESHOLD
  3. After satisfying the previous constraints, the system tries to satisfy the SET STORAGE strategy specified by the user. If no storage strategy is specified, each TOAST-able field is automatically declared EXTENDED.

Under these assumption, the only way to be sure that all the values of a column are stored out-of-row is to recompile the system with a value of TOAST_TUPLE_THRESHOLD less then the minumum size of any value of the column.

like image 142
Renzo Avatar answered Oct 23 '22 09:10

Renzo