Amazon RedShift: Unique Column not being honored

Question

I use the following query to create my table.

create table t1 (url varchar(250) unique);

Then I insert about 500 urls, twice. I am expecting that the second time I had the URLs that no new entries show up in my table, but instead my count value doubles for:

select count(*) from t1;

What I want is that when I try and add a url that is already in my table, it is skipped. Have I declared something in my table deceleration incorrect?

I am using RedShift from AWS.

Sample

urlenrich=# insert into seed(url, source) select 'http://www.google.com', '1';
INSERT 0 1
urlenrich=# select * from seed;
          url          | wascrawled | source | date_crawled 
-----------------------+------------+--------+--------------
 http://www.google.com |          0 |      1 | 
(1 row)

urlenrich=# insert into seed(url, source) select 'http://www.google.com', '1';
INSERT 0 1
urlenrich=# select * from seed;
          url          | wascrawled | source | date_crawled 
-----------------------+------------+--------+--------------
 http://www.google.com |          0 |      1 | 
 http://www.google.com |          0 |      1 | 
(2 rows)

Output of \d seed

urlenrich=# \d seed

                  Table "public.seed"
    Column    |            Type             | Modifiers 
--------------+-----------------------------+-----------
 url          | character varying(250)      | 
 wascrawled   | integer                     | default 0
 source       | integer                     | not null
 date_crawled | timestamp without time zone | 
Indexes:
    "seed_url_key" UNIQUE, btree (url)

Dan Ciborowski - MSFT · Accepted Answer

Figured out the problem

Amazon RedShift does not enforce constraints...

As explained here http://docs.aws.amazon.com/redshift/latest/dg/t_Defining_constraints.html

They said they may get around to changing it at some point.

NEW 11/21/2013 RDS has added support for PostGres, if you need unique and such an postgres rds instance is now the best way to go.

Ricardo Edo · Answer

In redshift, constraints are recommended but doesn't take effect, constraints will just help to the query planner to select better ways to perform the query.

Usually, columnar databases do not manage indexes or constraints.

Masashi M · Answer

Although Amazon Redshift doesn't support unique constraints, there are some ways to delete duplicated records that can be helpful. See the following link for the details.

copy data from Amazon s3 to Red Shift and avoid duplicate rows

Amazon RedShift: Unique Column not being honored

Tags:

amazon-web-services

amazon-redshift

Dan Ciborowski - MSFT

3 Answers

Dan Ciborowski - MSFT

Ricardo Edo

Masashi M

Recent Activity

Donate For Us

Amazon RedShift: Unique Column not being honored

Tags:

amazon-web-services

amazon-redshift

Dan Ciborowski - MSFT

3 Answers

Dan Ciborowski - MSFT

Ricardo Edo

Masashi M

Related questions

Recent Activity

Donate For Us