Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon Redshift Keys are not enforced - how to prevent duplicate data?

Tags:

Just testing out AWS Redshift, and having discovered some dupe data on an insert that I'd hoped would just fail on duplication in the key column, reading the docs reveal that primary key constraints aren't "enforced".

Anyone figured out how to prevent duplication on primary key (per "traditional" expectation).

Thanks to any Redshift pioneers!

like image 767
Saeven Avatar asked Mar 02 '13 04:03

Saeven


People also ask

How do you prevent duplicate records?

The SQL DISTINCT keyword, which we have already discussed is used in conjunction with the SELECT statement to eliminate all the duplicate records and by fetching only the unique records.

Which key does not allow duplicates of data?

You can define keys which allow duplicate values. However, do not allow duplicates on primary keys as the value of a record's primary key must be unique. When you use duplicate keys, be aware that there is a limit on the number of times you can specify the same value for an individual key.


1 Answers

I assign UUIDs when the records are created. If the record is inherently unique, I use type 4 UUIDs (random), and when they aren't I use type 5 (SHA-1 hash) using the natural keys as input.
Then you can follow this instruction by AWS very easily to perform UPSERTs. If your input has duplicates, you should be able to clean up by issuing a SQL that looks something like this in your staging table:

CREATE TABLE cleaned AS
SELECT
  pk_field,
  field_1,
  field_2,
  ...  
FROM (
       SELECT
         ROW_NUMBER() OVER (PARTITION BY pk_field order by pk_field) AS r,
       t.*
       from table1 t
     ) x
where x.r = 1
like image 87
Enno Shioji Avatar answered Oct 09 '22 20:10

Enno Shioji