Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to restrict duplicate record to insert into table in snowflake

I have created below table with primary key in snowflake and whenever i am trying to insert data into this table, it's allow duplicate records also. How to restrict duplicate id ?

create table tab11(id int primary key not null,grade varchar(10));

insert into tab11 values(1,'A');
insert into tab11 values(1,'B');

select * from tab11;

Output: Inserted duplicate records.

ID  GRADE
1   A
1   B
like image 294
Neeraj Yadav Avatar asked Jan 23 '20 07:01

Neeraj Yadav


4 Answers

You may want to look at using a merge statement to handle what happens when a row with a duplicate PK arrives:

create table tab1(id int primary key not null, grade varchar(10));

insert into tab1 values(1, 'A');

-- Try merging values 1, and 'B': Nothing will be added
merge into tab1 using 
    (select * from (values (1, 'B')) x(id, grade)) tab2 
  on tab1.id = tab2.id
    when not matched then insert (id, grade)
                          values (tab2.id, tab2.grade);

select * from tab1;

-- Try merging values 2, and 'B': New row added
merge into tab1 using 
    (select * from (values (2, 'B')) x(id, grade)) tab2 
  on tab1.id = tab2.id
    when not matched then insert (id, grade)
                          values (tab2.id, tab2.grade);

select * from tab1;

-- If instead of ignoring dupes, we want to update:
merge into tab1 using 
    (select * from (values (1, 'F'), (2, 'F')) x(id, grade)) tab2 
  on tab1.id = tab2.id
    when matched then update set tab1.grade = tab2.grade
    when not matched then insert (id, grade)
                          values (tab2.id, tab2.grade);

select * from tab1;

For more complex merges, you may want to investigate using Snowflake streams (change data capture tables). In addition to the documentation, I have created a SQL script walk through of how to use a stream to keep a staging and prod table in sync:

https://snowflake.pavlik.us/index.php/2020/01/12/snowflake-streams-made-simple

like image 33
Greg Pavlik Avatar answered Oct 12 '22 18:10

Greg Pavlik


Snowflake allows you to identify a column as a Primary Key but it doesn't enforce uniqueness on them. From the documentation here:

Snowflake supports defining and maintaining constraints, but does not enforce them, except for NOT NULL constraints, which are always enforced.

A Primary Key in Snowflake is purely for informative purposes. I'm not from Snowflake, but I imagine that enforcing uniqueness in Primary Keys does not really align with how Snowflake stores data behind the scenes and it probably would impact insertion speed.

like image 145
Simon D Avatar answered Oct 12 '22 19:10

Simon D


You could try to use SEQUENCE to fit your requirement

https://docs.snowflake.net/manuals/user-guide/querying-sequences.html#using-sequences

like image 32
sprethepa Avatar answered Oct 12 '22 20:10

sprethepa


Snowflake does NOT enforce unique constraints, hence you can only mitigate the issue by:

  • using a SEQUENCE to populate the column you want to be unique;
  • defining the column as NOT NULL (which is effectively enforced);
  • using a stored procedure where you can programmatically ensure no duplicates are introduced;
  • using a stored procedure (which could be run by scheduled Task possibly) to de-duplicate on a regular basis;
like image 32
Francesco Quaratino Avatar answered Oct 12 '22 18:10

Francesco Quaratino