Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon RedShift: Unique Column not being honored

I use the following query to create my table.

create table t1 (url varchar(250) unique);

Then I insert about 500 urls, twice. I am expecting that the second time I had the URLs that no new entries show up in my table, but instead my count value doubles for:

select count(*) from t1;

What I want is that when I try and add a url that is already in my table, it is skipped. Have I declared something in my table deceleration incorrect?

I am using RedShift from AWS.

Sample

urlenrich=# insert into seed(url, source) select 'http://www.google.com', '1';
INSERT 0 1
urlenrich=# select * from seed;
          url          | wascrawled | source | date_crawled 
-----------------------+------------+--------+--------------
 http://www.google.com |          0 |      1 | 
(1 row)

urlenrich=# insert into seed(url, source) select 'http://www.google.com', '1';
INSERT 0 1
urlenrich=# select * from seed;
          url          | wascrawled | source | date_crawled 
-----------------------+------------+--------+--------------
 http://www.google.com |          0 |      1 | 
 http://www.google.com |          0 |      1 | 
(2 rows)

Output of \d seed

urlenrich=# \d seed

                  Table "public.seed"
    Column    |            Type             | Modifiers 
--------------+-----------------------------+-----------
 url          | character varying(250)      | 
 wascrawled   | integer                     | default 0
 source       | integer                     | not null
 date_crawled | timestamp without time zone | 
Indexes:
    "seed_url_key" UNIQUE, btree (url)
like image 852
Dan Ciborowski - MSFT Avatar asked Jul 19 '13 18:07

Dan Ciborowski - MSFT


3 Answers

Figured out the problem

Amazon RedShift does not enforce constraints...

As explained here http://docs.aws.amazon.com/redshift/latest/dg/t_Defining_constraints.html

They said they may get around to changing it at some point.

NEW 11/21/2013 RDS has added support for PostGres, if you need unique and such an postgres rds instance is now the best way to go.

like image 120
Dan Ciborowski - MSFT Avatar answered Nov 11 '22 23:11

Dan Ciborowski - MSFT


In redshift, constraints are recommended but doesn't take effect, constraints will just help to the query planner to select better ways to perform the query.

Usually, columnar databases do not manage indexes or constraints.

like image 37
Ricardo Edo Avatar answered Nov 11 '22 23:11

Ricardo Edo


Although Amazon Redshift doesn't support unique constraints, there are some ways to delete duplicated records that can be helpful. See the following link for the details.

copy data from Amazon s3 to Red Shift and avoid duplicate rows

like image 1
Masashi M Avatar answered Nov 12 '22 01:11

Masashi M