I am using psql with a PostgreSQL database and the following copy command:
\COPY isa (np1, np2, sentence) FROM 'c:\Downloads\isa.txt' WITH DELIMITER '|'
I get:
ERROR: extra data after last expected column
How can I skip the lines with errors?
You cannot skip the errors without skipping the whole command up to and including Postgres 14. There is currently no more sophisticated error handling.
\copy is just a wrapper around SQL COPY that channels results through psql. The manual for COPY:
COPYstops operation at the first error. This should not lead to problems in the event of aCOPY TO, but the target table will already have received earlier rows in aCOPY FROM. These rows will not be visible or accessible, but they still occupy disk space. This might amount to a considerable amount of wasted disk space if the failure happened well into a large copy operation. You might wish to invokeVACUUMto recover the wasted space.
Bold emphasis mine. And:
COPY FROMwill raise an error if any line of the input file contains more or fewer columns than are expected.
COPY is an extremely fast way to import / export data. Sophisticated checks and error handling would slow it down.
There was an attempt to add error logging to COPY in Postgres 9.0 but it was never committed.
Fix your input file instead.
If you have one or more additional column in your input file and the file is otherwise consistent, you might add dummy columns to your table isa and drop those afterwards. Or (cleaner with production tables) import to a temporary staging table and INSERT selected columns (or expressions) to your target table isa from there.
Related answers with detailed instructions:
It is too bad that in 25 years Postgres doesn't have -ignore-errors flag or option for COPY command. In this era of BigData you get a lot of dirty records and it can be very costly for the project to fix every outlier.
I had to make a work-around this way:
dummy_original_table
CREATE OR REPLACE FUNCTION on_insert_in_original_table() RETURNS trigger AS $$
DECLARE
v_rec RECORD;
BEGIN
-- we use the trigger to prevent 'duplicate index' error by returning NULL on duplicates
SELECT * FROM original_table WHERE primary_key=NEW.primary_key INTO v_rec;
IF v_rec IS NOT NULL THEN
RETURN NULL;
END IF;
BEGIN
INSERT INTO original_table(datum,primary_key) VALUES(NEW.datum,NEW.primary_key)
ON CONFLICT DO NOTHING;
EXCEPTION
WHEN OTHERS THEN
NULL;
END;
RETURN NULL;
END;
psql dbname -c \copy dummy_original_table(datum,primary_key) FROM '/home/user/data.csv' delimiter E'\t'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With