Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bulk updating existing rows in Redshift

This seems like it should be easy, but isn't. I'm migrating a query from MySQL to Redshift of the form:

INSERT INTO table
(...)
VALUES
(...)
ON DUPLICATE KEY UPDATE
  value = MIN(value, VALUES(value))

For primary keys we're inserting that aren't already in the table, those are just inserted. For primary keys that are already in the table, we update the row's values based on a condition that depends on the existing and new values in the row.

http://docs.aws.amazon.com/redshift/latest/dg/merge-replacing-existing-rows.html does not work, because filter_expression in my case depends on the current entries in the table. I'm currently creating a staging table, inserting into it with a COPY statement and am trying to figure out the best way to merge the staging and real tables.

like image 586
moinudin Avatar asked Mar 20 '14 19:03

moinudin


People also ask

Can we update data in Redshift?

While Amazon Redshift does not support a single merge, or upsert, command to update a table from a single data source, you can perform a merge operation by creating a staging table and then using one of the methods described in this section to update the target table from the staging table.

How do you refresh a table in Redshift?

To update the data in a materialized view, you can use the REFRESH MATERIALIZED VIEW statement at any time. When you use this statement, Amazon Redshift identifies changes that have taken place in the base table or tables, and then applies those changes to the materialized view.

Does coalesce work on Redshift?

Redshift coalesce function is used to return the first not null value from table column or argument list which was we have used in query. This function will evaluates values or arguments from left to right to find the not-null value or argument.


1 Answers

I'm having to do exactly this for a project right now. The method I'm using involves 3 steps:

1.

Run an update that addresses changed fields (I'm updating whether or not the fields have changed, but you can certainly qualify that):

update table1 set col1=s.col1, col2=s.col2,...
from table1 t
 join stagetable s on s.primkey=t.primkey;

2.

Run an insert that addresses new records:

insert into table1
select s.* 
from stagetable s 
 left outer join table1 t on s.primkey=t.primkey
where t.primkey is null;

3.

Mark rows no longer in the source as inactive (our reporting tool uses views that filter inactive records):

update table1 
set is_active_flag='N', last_updated=sysdate
from table1 t
 left outer join stagetable s on s.primkey=t.primkey
where s.primkey is null;
like image 117
mike_pdb Avatar answered Oct 03 '22 05:10

mike_pdb