Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using CTE and Update in Redshift

I am converting some SQL Logic from T-SQL used in SSMS to Amazon Redshift. I believe Redshift is a fork of Postgres version 8.0.2 so the below may not be possible unless using Postgres 9.1.

WITH CTE_ID AS 
(
SELECT FULL_NAME, COUNT(DISTINCT ID) as ID_COUNT, MAX(ID) AS MAX_ID
FROM MEMBERS
GROUP BY FULL_NAME
HAVING COUNT(DISTINCT ID) > 1
)
UPDATE a
SET a.ID = b.MAX_ID
FROM MEMBERS a
INNER JOIN CTE_ID b
ON a.FULL_NAME = b.FULL_NAME

If this feature is not supported by Amazon Redshift, would my best option be to create a new "temporary" table and populate it with the values the CTE would generate?

like image 237
TZ100 Avatar asked Mar 26 '18 19:03

TZ100


People also ask

Can CTE be used for update statement?

Specifies a temporary named result set, known as a common table expression (CTE). This is derived from a simple query and defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE or MERGE statement. This clause can also be used in a CREATE VIEW statement as part of its defining SELECT statement.

Can we update records in redshift?

You can update a table by referencing information in other tables. List these other tables in the FROM clause or use a subquery as part of the WHERE condition. Tables listed in the FROM clause can have aliases. If you need to include the target table of the UPDATE statement in the list, use an alias.

What is CTE redshift?

A recursive common table expression (CTE) is a CTE that references itself. A recursive CTE is useful in querying hierarchical data, such as organization charts that show reporting relationships between employees and managers. See Example: Recursive CTE.

How do I write a redshift update statement?

The correct syntax is: UPDATE table_name SET column = { expression | DEFAULT } [,...] So your UPDATE statement should look as follows: update t1 set val1 = val3 from t2 inner join t3 on t2.


2 Answers

You can re-write the query as a derived table as mentioned by @a_horse_with_no_name:

UPDATE MEMBERS
SET a.ID = b.MAX_ID
FROM MEMBERS a
INNER JOIN (
  SELECT FULL_NAME, COUNT(DISTINCT ID) as ID_COUNT, MAX(ID) AS MAX_ID
  FROM MEMBERS
  GROUP BY FULL_NAME
  HAVING COUNT(DISTINCT ID) > 1
  ) b
ON a.FULL_NAME = b.FULL_NAME
like image 117
jose_bacoy Avatar answered Sep 22 '22 14:09

jose_bacoy


Existing answers (including the accepted) are invalid. This should work:

UPDATE members AS a
SET    id = b.max_id
FROM  (
   SELECT full_name, max(id) AS max_id
   FROM   members
   GROUP  BY full_name
   HAVING count(DISTINCT id) > 1
   ) b
WHERE  a.full_name = b.full_name
AND    a.id IS DISTINCT FROM b.max_id;

No need for a CTE (though possible). A subquery is simpler.

The target table is only listed once. You'd only repeat it in the FROM clause with a (different) alias for special needs.

Target columns in the SET list cannot be table-qualified.

Unquoted names are folded to lower case in Redshift. UPPER case spelling only adds confusion.

I added the predicate AND a.id IS DISTINCT FROM b.max_id to skip updates on rows that would not change. (Expensive no-op.) You'd only want those in exotic cases to trigger (undeclared) side effects.

More in the Redshift manual for UPDATE.

like image 21
Erwin Brandstetter Avatar answered Sep 26 '22 14:09

Erwin Brandstetter