Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"ERROR: extra data after last expected column" when using PostgreSQL COPY

Please bear with me as this is my first post.

I'm trying to run the COPY command in PostgreSQL-9.2 to add a tab delimited table from a .txt file to a PostgreSQL database such as:

COPY raw_data FROM '/home/Projects/TestData/raw_data.txt' WITH (DELIMITER ' ');

I've already created an empty table called "raw_data" in the database using the SQL command:

CREATE TABLE raw_data ();

I keep getting the following error message when trying to run the COPY command:

ERROR:  extra data after last expected column
CONTEXT:  COPY raw_data, line 1: "  1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  ..."

(The numbers here are supposed to be the column headings)

I'm not sure if its because I didn't specify table columns when creating the db table but I'm trying to avoid having to manually enter in 800 or columns.

Any suggestions on how to fix this?

Here's an example of what the .txt file looks like:

        1   2   3   4   5   6   7   8   9
binary1 1   1   0   1   1   1   1   1   1
binary2 1   0   0   1   0   1   1   0   0
binary3 1   0   1   1   1   0   0   1   0
binary4 1   1   1   1   0   1   0   1   0
like image 981
dnak Avatar asked May 03 '13 20:05

dnak


1 Answers

An empty table won't do. You need table that matches the structure of input data. Something like:

CREATE TABLE raw_data (
  col1 int
, col2 int
  ...
);

You don't need to declare tab as DELIMITER since that's the default:

COPY raw_data FROM '/home/Projects/TestData/raw_data.txt';

800 columns you say? That many columns would typically indicate a problem with your design. Anyway, there are ways to half-automate the CREATE TABLE script.

Automation

Assuming simplified raw data

1   2   3   4  -- first row contains "column names"
1   1   0   1  -- tab separated
1   0   0   1
1   0   1   1

Define a different DELIMITER (one that does not occur in the import data at all), and import to a temporary staging table with a single text column:

CREATE TEMP TABLE tmp_data (raw text);

COPY tmp_data FROM '/home/Projects/TestData/raw_data.txt' WITH (DELIMITER '§');

This query creates the CREATE TABLE script:

SELECT 'CREATE TABLE tbl (col' || replace (raw, E'\t', ' bool, col') || ' bool)'
FROM   (SELECT raw FROM tmp_data LIMIT 1) t;

A more generic & safer query:

SELECT 'CREATE TABLE tbl('
    ||  string_agg(quote_ident('col' || col), ' bool, ' ORDER  BY ord)
    || ' bool);'
FROM  (SELECT raw FROM tmp_data LIMIT 1) t
     , unnest(string_to_array(t.raw, E'\t')) WITH ORDINALITY c(col, ord);

Returns:

CREATE TABLE tbl (col1 bool, col2 bool, col3 bool, col4 bool);

Execute after verifying validity - or execute dynamically if you trust the result:

DO
$$BEGIN
EXECUTE (
   SELECT 'CREATE TABLE tbl (col' || replace(raw, ' ', ' bool, col') || ' bool)'
   FROM  (SELECT raw FROM tmp_data LIMIT 1) t
   );
END$$;

Then INSERT the data with this query:

INSERT INTO tbl
SELECT (('(' || replace(replace(replace(
                  raw
                , '1',   't')
                , '0',   'f')
                , E'\t', ',')
             || ')')::tbl).*
FROM   (SELECT raw FROM tmp_data OFFSET 1) t;

Or simpler with translate():

INSERT INTO tbl
SELECT (('(' || translate(raw, E'10\t', 'tf,') || ')')::tbl).*
FROM   (SELECT raw FROM tmp_data OFFSET 1) t;

The string is converted into a row literal, cast to the newly created table row type and decomposed with (row).*.

All done.

You could put all of that into a plpgsql function, but you'd need to safeguard against SQL injection. (There are a number of related solutions here on SO. Try a search.

db<>fiddle here
Old SQL Fiddle

like image 135
Erwin Brandstetter Avatar answered Oct 14 '22 07:10

Erwin Brandstetter