PostgreSQL Upsert differentiate inserted and updated rows using system columns XMIN, XMAX and others

Tags:

Disclaimer: theoretical question.

Several questions here was asked about how to differentiate inserted and updated rows in the PostgreSQL upsert statement.

Here is a simple example:

create table t(i int primary key, x int); insert into t values(1,1); insert into t values(1,11),(2,22)     on conflict(i) do update set x = excluded.i*11     returning *, xmin, xmax;  ╔═══╤════╤══════╤══════╗ ║ i │ x  │ xmin │ xmax ║ ╠═══╪════╪══════╪══════╣ ║ 1 │ 11 │ 7696 │ 7696 ║ ║ 2 │ 22 │ 7696 │    0 ║ ╚═══╧════╧══════╧══════╝

So, xmax > 0 (or xmax = xmin) - row was updated; xmax = 0 - row was inserted.

IMO It is not too clear explained the meanings of the xmin and xmax columns here.

Is it possible to base the logic on those columns? Is there any more significant explanation about system columns (except the source code)?

And finally is my guess right about updated/inserted rows?

528

asked Aug 20 '16 20:08

Abelisto

1 Answers

I think that this is an interesting question that deserves an in-depth answer; please bear with me if it is a little bit lengthy.

In short: Your guess is right, and you can use the following RETURNING clause to determine if the row was inserted and not updated:

RETURNING (xmax = 0) AS inserted

Now the detailed explanation:

When a row is updated, PostgreSQL does not modify the data, but creates a new version of the row; the old version will be deleted by autovacuum when it is no longer needed. A version of a row is called a tuple, so in PostgreSQL there can be more than one tuples per row.

xmax serves two different purposes:

As stated in the documentation, it can be the transaction ID of the transaction that deleted (or updated) the tuple (“tuple” is another word for “row”). Only transactions with a transaction ID between xmin and xmax can see the tuple. An old tuple can be deleted safely if there is no transaction with a transaction ID less than xmax.
xmax is also used to store row locks. In PostgreSQL, row locks are not stored in the lock table, but in the tuple to avoid overflow of the lock table.
If only one transaction has a lock on the row, xmax will contain the transaction ID of the locking transaction. If more than one transaction has a lock on the row, xmax contains the number of a so-called multixact, which is a data structure that in turn contains the transaction IDs of the locking transactions.

The documentation of xmax is not complete, because the exact meaning of this field is considered an implementation detail and cannot be understood without knowing t_infomask of the tuple, which is not immediately visible via SQL.

You can install the contrib module pageinspect to view this and other fields of a tuple.

I ran your example, and this is what I see when I use the heap_page_items function to examine details (the transaction ID numbers are of course different in my case):

SELECT *, ctid, xmin, xmax FROM t;  ┌───┬────┬───────┬────────┬────────┐ │ i │ x  │ ctid  │  xmin  │  xmax  │ ├───┼────┼───────┼────────┼────────┤ │ 1 │ 11 │ (0,2) │ 102508 │ 102508 │ │ 2 │ 22 │ (0,3) │ 102508 │      0 │ └───┴────┴───────┴────────┴────────┘ (2 rows)  SELECT lp, lp_off, t_xmin, t_xmax, t_ctid,        to_hex(t_infomask) AS t_infomask, to_hex(t_infomask2) AS t_infomask2 FROM heap_page_items(get_raw_page('laurenz.t', 0));  ┌────┬────────┬────────┬────────┬────────┬────────────┬─────────────┐ │ lp │ lp_off │ t_xmin │ t_xmax │ t_ctid │ t_infomask │ t_infomask2 │ ├────┼────────┼────────┼────────┼────────┼────────────┼─────────────┤ │  1 │   8160 │ 102507 │ 102508 │ (0,2)  │ 500        │ 4002        │ │  2 │   8128 │ 102508 │ 102508 │ (0,2)  │ 2190       │ 8002        │ │  3 │   8096 │ 102508 │      0 │ (0,3)  │ 900        │ 2           │ └────┴────────┴────────┴────────┴────────┴────────────┴─────────────┘ (3 rows)

The meanings of t_infomask and t_infomask2 can be found in src/include/access/htup_details.h. lp_off is the offset of the tuple data in the page, and t_ctid is the current tuple ID which consists of the page number and a tuple number within the page. Since the table was newly created, all data are in page 0.

Let me discuss the three rows returned by heap_page_items.

At line pointer (lp) 1 we find the old, updated tuple. It originally had ctid = (0,1), but that got modified to contain the tuple ID of the current version during update. The Tuple was created by transaction 102507 and invalidated by transaction 102508 (the transaction that issued the INSERT ... ON CONFLICT). This tuple is not visible any more an will get removed during VACUUM.

t_infomask shows that both xmin and xmax belong to committed transactions and consequently show when the tuples was created and deleted. t_infomask2 shows that the tuple was updated with a HOT (heap only tuple) update, which means that the updated tuple is in the same page as the original tuple and no indexed column was modified (see src/backend/access/heap/README.HOT).
At line pointer 2 we see the new, updated tuple that was created by transaction the INSERT ... ON CONFLICT (transaction 102508).

t_infomask shows that this tuple is the result of an update, xmin is valid, and xmax contains a KEY SHARE row lock (which is no longer relevant since the transaction has completed). This row lock was taken during INSERT ... ON CONFLICT processing. t_infomask2 shows that this is a HOT tuple.
At line pointer 3 we see the newly inserted row.

t_infomask shows that xmin is valid and xmax is invalid. xmax is set to 0 because this value is always used for newly inserted tuples.

So the nonzero xmax of the updated row is an implementation artifact caused by a row lock. It is conceivable that INSERT ... ON CONFLICT is reimplemented one day so that this behaviour changes, but I think that is unlikely.

answered Oct 21 '22 15:10

Laurenz Albe

Related questions
                            
                                Alembic: alter column type with USING
                            
                                How to find number of rows in heroku database
                            
                                Combine two columns and add into one new column
                            
                                get last three month records from table
                            
                                Sequence does not reset after truncating the table
                            
                                Postgresql Windows, is there a default password? [closed]
                            
                                Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5432? [closed]
                            
                                Any Postgres compatible ORM for Node.js? [closed]
                            
                                Difference between RESTRICT and NO ACTION
                            
                                Converting SELECT DISTINCT ON queries from Postgresql to MySQL
                            
                                Rails 3.1 - Pushing to Heroku - Errors installing postgres adapter?
                            
                                How to dump a postgres db excluding one specific table?
                            
                                What are PostgreSQL RULEs good for?
                            
                                Why can I create a table with PRIMARY KEY on a nullable column?
                            
                                Does PostgreSQL support transparent compressing of tables (fragments)?
                            
                                Maximum size of "TEXT" datatype in postgresql
                            
                                org.postgresql.util.PSQLException: ERROR: column user0_.id does not exist - Hibernate
                            
                                Use psql's \copy for a multi-line query
                            
                                Job queue as SQL table with multiple consumers (PostgreSQL)
                            
                                SQL Transpose Rows as Columns

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

PostgreSQL Upsert differentiate inserted and updated rows using system columns XMIN, XMAX and others

Tags:

postgresql

upsert

postgresql-9.5

Abelisto

People also ask

1 Answers

Laurenz Albe

Recent Activity

Donate For Us