This seems like it should be easy, but isn't. I'm migrating a query from MySQL to Redshift of the form:
INSERT INTO table
(...)
VALUES
(...)
ON DUPLICATE KEY UPDATE
value = MIN(value, VALUES(value))
For primary keys we're inserting that aren't already in the table, those are just inserted. For primary keys that are already in the table, we update the row's values based on a condition that depends on the existing and new values in the row.
http://docs.aws.amazon.com/redshift/latest/dg/merge-replacing-existing-rows.html does not work, because filter_expression
in my case depends on the current entries in the table. I'm currently creating a staging table, inserting into it with a COPY
statement and am trying to figure out the best way to merge the staging and real tables.
While Amazon Redshift does not support a single merge, or upsert, command to update a table from a single data source, you can perform a merge operation by creating a staging table and then using one of the methods described in this section to update the target table from the staging table.
To update the data in a materialized view, you can use the REFRESH MATERIALIZED VIEW statement at any time. When you use this statement, Amazon Redshift identifies changes that have taken place in the base table or tables, and then applies those changes to the materialized view.
Redshift coalesce function is used to return the first not null value from table column or argument list which was we have used in query. This function will evaluates values or arguments from left to right to find the not-null value or argument.
I'm having to do exactly this for a project right now. The method I'm using involves 3 steps:
1.
Run an update that addresses changed fields (I'm updating whether or not the fields have changed, but you can certainly qualify that):
update table1 set col1=s.col1, col2=s.col2,...
from table1 t
join stagetable s on s.primkey=t.primkey;
2.
Run an insert that addresses new records:
insert into table1
select s.*
from stagetable s
left outer join table1 t on s.primkey=t.primkey
where t.primkey is null;
3.
Mark rows no longer in the source as inactive (our reporting tool uses views that filter inactive records):
update table1
set is_active_flag='N', last_updated=sysdate
from table1 t
left outer join stagetable s on s.primkey=t.primkey
where s.primkey is null;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With