I'm finding that using writable CTEs to emulate an upsert in PostgreSQL to be quite an elegant solution until we get actual upsert/merge in Postgres. (see: https://stackoverflow.com/a/8702291/558819)
However, there is one problem: how can I insert the default value? Using NULL
won't help of course as NULL
gets explicitly inserted as NULL
, unlike for example with MySQL. An example:
WITH new_values (id, playlist, item, group_name, duration, sort, legacy) AS (
VALUES (651, 21, 30012, 'a', 30, 1, FALSE)
, (NULL::int, 21, 1, 'b', 34, 2, NULL::boolean)
, (668, 21, 30012, 'c', 30, 3, FALSE)
, (7428, 21, 23068, 'd', 0, 4, FALSE)
), upsert AS (
UPDATE playlist_items m
SET (playlist, item, group_name, duration, sort, legacy)
= (nv.playlist, nv.item, nv.group_name, nv.duration, nv.sort, nv.legacy)
FROM new_values nv
WHERE nv.id = m.id
RETURNING m.id
)
INSERT INTO playlist_items (playlist, item, group_name, duration, sort, legacy)
SELECT playlist, item, group_name, duration, sort, legacy
FROM new_values nv
WHERE NOT EXISTS (SELECT 1
FROM upsert m
WHERE nv.id = m.id)
RETURNING id
So I'd like for example for the legacy
column to take on its default value for the second VALUES
row.
I've tried a few things, such as explicitly using DEFAULT
in the VALUES list, which doesn't work because the CTE has no idea what it's inserting in. I've also tried coalesce(col, DEFAULT)
in the insert statement which didn't seem to work either. So, is it possible to do what I want?
PostgreSQL lets you either add or modify a record within a table depending on whether the record already exists. This is commonly known as an "upsert" operation (a portmanteau of "insert" and "update").
In PostgreSQL, the UPSERT operation means either UPDATE or INSERT operation. The UPSERT operation allows us to either insert a row or skip the insert operation if a row already exists and update that row instead. Suppose you want to insert bulk data from one table to another table that already has some data.
In PostgreSQL, the CTE(Common Table Expression) is used as a temporary result set that the user can reference within another SQL statement like SELECT, INSERT, UPDATE or DELETE. CTEs are temporary in the sense that they only exist during the execution of the query.
Postgres 9.5 implemented UPSERT
. See below.
This is a tricky problem. You are running into this restriction (per documentation):
In a
VALUES
list appearing at the top level of anINSERT
, an expression can be replaced byDEFAULT
to indicate that the destination column's default value should be inserted.DEFAULT
cannot be used whenVALUES
appears in other contexts.
Bold emphasis mine. Default values are not defined without a table to insert into. So there is no direct solution to your question, but there is a number of possible alternative routes, depending on exact requirements.
You could fetch those from the system catalog pg_attrdef
like @Patrick commented or from information_schema.columns
. Complete instructions here:
But then you still only have a list of rows with a text representation of the expression to cook the default value. You would have to build and execute statements dynamically to get values to work with. Tedious and messy. Instead, we can let built-in Postgres functionality do that for us:
Insert a dummy row and have it returned to use generated defaults:
INSERT INTO playlist_items DEFAULT VALUES RETURNING *;
STABLE
or IMMUTABLE
default expressions. Most VOLATILE
functions will work just as well, but there are no guarantees. The current_timestamp
family of functions qualify as stable, since their values do not change within a transaction.serial
columns (or any other defaults drawing from a sequence). But that should not be a problem, because you don't normally write to serial
columns directly. Those shouldn't be listed in INSERT
statements at all.serial
columns: the sequence is still advanced by the single call to get a default row, producing a gap in the numbering. Again, that should not be a problem, because gaps are generally to be expected in serial
columns.Two more problems can be solved:
If you have columns defined NOT NULL
, you have to insert dummy values and replace with NULL
in the result.
We do not actually want to insert the dummy row. We could delete later (in the same transaction), but that may have more side effects, like triggers ON DELETE
. There is a better way:
Clone a temporary table including column defaults and insert into that:
BEGIN;
CREATE TEMP TABLE tmp_playlist_items (LIKE playlist_items INCLUDING DEFAULTS)
ON COMMIT DROP; -- drop at end of transaction
INSERT INTO tmp_playlist_items DEFAULT VALUES RETURNING *;
...
Same result, fewer side effects. Since default expressions are copied verbatim, the clone draws from the same sequences if any. But other side effects from the unwanted row or triggers are avoided completely.
Credit to Igor for the idea:
NOT NULL
constraintsYou would have to provide dummy values for NOT NULL
columns, because (per documentation):
Not-null constraints are always copied to the new table.
Either accommodate for those in the INSERT
statement or (better) eliminate the constraints:
ALTER TABLE tmp_playlist_items
ALTER COLUMN foo DROP NOT NULL
, ALTER COLUMN bar DROP NOT NULL;
There is a quick and dirty way with superuser privileges:
UPDATE pg_attribute
SET attnotnull = FALSE
WHERE attrelid = 'tmp_playlist_items'::regclass
AND attnotnull
AND attnum > 0;
It is just a temporary table with no data and no other purpose, and it's dropped at the end of the transaction. So the shortcut is tempting. Still, the basic rule is: never tamper with system catalogs directly.
So, let's look into a clean way:
Automate with dynamic SQL in a DO
statement. You just need the regular privileges you are guaranteed to have since the same role created the temp table.
DO $$BEGIN
EXECUTE (
SELECT 'ALTER TABLE tmp_playlist_items ALTER '
|| string_agg(quote_ident(attname), ' DROP NOT NULL, ALTER ')
|| ' DROP NOT NULL'
FROM pg_catalog.pg_attribute
WHERE attrelid = 'tmp_playlist_items'::regclass
AND attnotnull
AND attnum > 0
);
END$$
Much cleaner and still very fast. Execute care with dynamic commands and be wary of SQL injection. This statement is safe. I have posted several related answers with more explanation.
BEGIN;
CREATE TEMP TABLE tmp_playlist_items
(LIKE playlist_items INCLUDING DEFAULTS) ON COMMIT DROP;
DO $$BEGIN
EXECUTE (
SELECT 'ALTER TABLE tmp_playlist_items ALTER '
|| string_agg(quote_ident(attname), ' DROP NOT NULL, ALTER ')
|| ' DROP NOT NULL'
FROM pg_catalog.pg_attribute
WHERE attrelid = 'tmp_playlist_items'::regclass
AND attnotnull
AND attnum > 0
);
END$$;
LOCK TABLE playlist_items IN EXCLUSIVE MODE; -- forbid concurrent writes
WITH default_row AS (
INSERT INTO tmp_playlist_items DEFAULT VALUES RETURNING *
)
, new_values (id, playlist, item, group_name, duration, sort, legacy) AS (
VALUES
(651, 21, 30012, 'a', 30, 1, FALSE)
, (NULL, 21, 1, 'b', 34, 2, NULL)
, (668, 21, 30012, 'c', 30, 3, FALSE)
, (7428, 21, 23068, 'd', 0, 4, FALSE)
)
, upsert AS ( -- *not* replacing existing values in UPDATE (?)
UPDATE playlist_items m
SET ( playlist, item, group_name, duration, sort, legacy)
= (n.playlist, n.item, n.group_name, n.duration, n.sort, n.legacy)
-- ..., COALESCE(n.legacy, m.legacy) -- see below
FROM new_values n
WHERE n.id = m.id
RETURNING m.id
)
INSERT INTO playlist_items
(playlist, item, group_name, duration, sort, legacy)
SELECT n.playlist, n.item, n.group_name, n.duration, n.sort
, COALESCE(n.legacy, d.legacy)
FROM new_values n, default_row d -- single row can be cross-joined
WHERE NOT EXISTS (SELECT 1 FROM upsert u WHERE u.id = n.id)
RETURNING id;
COMMIT;
You only need the LOCK
if you have concurrent transactions trying to write to the same table.
As requested, this only replaces NULL values in the column legacy
in the input rows for the INSERT
case. Can easily be extended to work for other columns or in the UPDATE
case as well. For instance, you could UPDATE
conditionally as well: only if the input value is NOT NULL
. I added a commented line to the UPDATE
above.
Aside: You do not need to cast values in any row but the first in a VALUES
expression, since types are derived from the first row.
implements UPSERT with INSERT .. ON CONFLICT .. DO NOTHING | UPDATE
. This largely simplifies the operation:
INSERT INTO playlist_items AS m (id, playlist, item, group_name, duration, sort, legacy)
VALUES (651, 21, 30012, 'a', 30, 1, FALSE)
, (DEFAULT, 21, 1, 'b', 34, 2, DEFAULT) -- !
, (668, 21, 30012, 'c', 30, 3, FALSE)
, (7428, 21, 23068, 'd', 0, 4, FALSE)
ON CONFLICT (id) DO UPDATE
SET (playlist, item, group_name, duration, sort, legacy)
= (EXCLUDED.playlist, EXCLUDED.item, EXCLUDED.group_name
, EXCLUDED.duration, EXCLUDED.sort, EXCLUDED.legacy)
-- (..., COALESCE(l.legacy, EXCLUDED.legacy)) -- see below
RETURNING m.id;
We can attach the VALUES
clause to INSERT
directly, which allows the DEFAULT
keyword. In the case of unique violations on (id)
, Postgres updates instead. We can use excluded rows in the UPDATE
. The manual:
The
SET
andWHERE
clauses inON CONFLICT DO UPDATE
have access to the existing row using the table's name (or an alias), and to rows proposed for insertion using the specialexcluded
table.
And:
Note that the effects of all per-row
BEFORE INSERT
triggers are reflected in excluded values, since those effects may have contributed to the row being excluded from insertion.
You have various options for the UPDATE
: You can ...
WHERE
clause to the UPDATE
to only write to selected rows.COALESCE(l.legacy, EXCLUDED.legacy)
NOT NULL
: COALESCE(EXCLUDED.legacy, l.legacy)
But there is no way to discern DEFAULT
values and values actually provided in the INSERT
. Only resulting EXCLUDED
rows are visible. If you need the distinction, fall back to the previous solution, where you have both at our disposal.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With