Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vertica: Data validation of duplicate/primary key

I'm trying to create a validation procedure during a load that checks to make sure data isn't duplicated. Vertica doesn't support this natively:

Vertica checks for constraint violations when queries are run, not when data is loaded. To detect constraint violations as part of the load process, use a COPY (page 667) statement with the NO COMMIT option. By loading data without committing it, you can run a post-load check of your data using the ANALYZE_CONSTRAINTS function. If the function finds constraint violations, you can roll back the load because you have not committed it.

The problem is that I can't figure out how to do this programmatically. I suspect that I need a stored procedure, but I'm not familiar with the stored procedure syntax/limitations for vertica. Can you help? Here's what I have:

-- Create a new table. "id" is auto-incremented and "name" must be unique
CREATE TABLE IF NOT EXISTS my_table (
id IDENTITY
, name varchar(50) UNIQUE NOT NULL
, type varchar(20)
, description varchar(200)
);

--Insert a record
begin; 
copy my_table from stdin
abort on error
NO COMMIT; -- this begins the load
name1|type1|description1 --this is the load
\. -- this closes the load
commit;

-- insert the duplicate record
begin; 
copy my_table from stdin
abort on error
NO COMMIT; -- this begins the load
name1|type1|description1 --this is the load
\. -- this closes the load
commit; -- Surprisingly, the load executes successfully! What's going on?!?!

-- Check constraints. We see that there is a failed constraints:
select analyze_constraints('my_table');

My thinking is to do some conditional logic. Psudo-code is below. Can you help me prepare it for Vertica?

Begin
load data
if (select count(*) from (select analyze_constraints('my_table')) sub) == 0:
commit
else rollback
like image 596
Peter Avatar asked Sep 28 '12 23:09

Peter


1 Answers

-- Start by Setting Vertica up to rollback and return an error code 
-- if an error is encountered.

\set ON_ERROR_STOP on

-- Load Data here (code omitted since you already have this)


-- Raise an Error condition by selecting 1/0 if any rows were rejected
-- during the load
SELECT
         GET_NUM_REJECTED_ROWS() AS NumRejectedRows
        ,GET_NUM_ACCEPTED_ROWS() AS NumAcceptedRows
;

SELECT 1 / (1-SIGN(GET_NUM_REJECTED_ROWS()));


-- Raise an Error condition if there are duplicates in my_table
SELECT 1 / ( 1 - SIGN( COUNT(*) ) )
FROM ( SELECT  name1,type1,description1
         FROM MY_TABLE
       GROUP BY 1,2,3
       HAVING COUNT(*) > 1 ) AS T1 ;

-- Raise an Error if primary key constraint is violated.
SELECT 1 / ( 1 - SIGN( COUNT(*) ) )
FROM (SELECT  ANALYZE_CONSTRAINTS ('my_table')) AS T1;

COMMIT;    
like image 69
Doug H. Avatar answered Oct 03 '22 09:10

Doug H.