I have a table in a Redshift cluster with ~1 billion rows. I have a job that tries to update some column values based on some filter. Updating anything at all in this table is incredibly slow. Here's an example:
SELECT col1, col2, col3
FROM SOMETABLE
WHERE col1 = 'a value of col1'
AND col2 = 12;
The above query returns in less than a second, because I have sortkeys on col1
and col2
. There is only one row that meets this criteria, so the result set is just one row. However, if I run:
UPDATE SOMETABLE
SET col3 = 20
WHERE col1 = 'a value of col1'
AND col2 = 12;
This query takes an unknown amount of time (I stopped it after 20 minutes). Again, it should be updating one column value of one row.
I have also tried to follow the documentation here: http://docs.aws.amazon.com/redshift/latest/dg/merge-specify-a-column-list.html, which talks about creating a temporary staging table to update the main table, but got the same results.
Any idea what is going on here?
You didn't mention what percentage of the table you're updating but it's important to note that an UPDATE
in Redshift is a 2 step process:
If you have a large number of columns and/or are updating a large number of rows then this process can be very labor intensive for the database.
You could experiment with using a CREATE TABLE AS
statement to create a new "updated" version of the table and then dropping the existing table and renaming the new table. This has the added benefit of leaving you with a fully sorted table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With