I'm trying to create a validation procedure during a load that checks to make sure data isn't duplicated. Vertica doesn't support this natively:
Vertica checks for constraint violations when queries are run, not when data is loaded. To detect constraint violations as part of the load process, use a COPY (page 667) statement with the NO COMMIT option. By loading data without committing it, you can run a post-load check of your data using the ANALYZE_CONSTRAINTS function. If the function finds constraint violations, you can roll back the load because you have not committed it.
The problem is that I can't figure out how to do this programmatically. I suspect that I need a stored procedure, but I'm not familiar with the stored procedure syntax/limitations for vertica. Can you help? Here's what I have:
-- Create a new table. "id" is auto-incremented and "name" must be unique
CREATE TABLE IF NOT EXISTS my_table (
id IDENTITY
, name varchar(50) UNIQUE NOT NULL
, type varchar(20)
, description varchar(200)
);
--Insert a record
begin;
copy my_table from stdin
abort on error
NO COMMIT; -- this begins the load
name1|type1|description1 --this is the load
\. -- this closes the load
commit;
-- insert the duplicate record
begin;
copy my_table from stdin
abort on error
NO COMMIT; -- this begins the load
name1|type1|description1 --this is the load
\. -- this closes the load
commit; -- Surprisingly, the load executes successfully! What's going on?!?!
-- Check constraints. We see that there is a failed constraints:
select analyze_constraints('my_table');
My thinking is to do some conditional logic. Psudo-code is below. Can you help me prepare it for Vertica?
Begin
load data
if (select count(*) from (select analyze_constraints('my_table')) sub) == 0:
commit
else rollback
-- Start by Setting Vertica up to rollback and return an error code
-- if an error is encountered.
\set ON_ERROR_STOP on
-- Load Data here (code omitted since you already have this)
-- Raise an Error condition by selecting 1/0 if any rows were rejected
-- during the load
SELECT
GET_NUM_REJECTED_ROWS() AS NumRejectedRows
,GET_NUM_ACCEPTED_ROWS() AS NumAcceptedRows
;
SELECT 1 / (1-SIGN(GET_NUM_REJECTED_ROWS()));
-- Raise an Error condition if there are duplicates in my_table
SELECT 1 / ( 1 - SIGN( COUNT(*) ) )
FROM ( SELECT name1,type1,description1
FROM MY_TABLE
GROUP BY 1,2,3
HAVING COUNT(*) > 1 ) AS T1 ;
-- Raise an Error if primary key constraint is violated.
SELECT 1 / ( 1 - SIGN( COUNT(*) ) )
FROM (SELECT ANALYZE_CONSTRAINTS ('my_table')) AS T1;
COMMIT;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With