Please bear with me as this is my first post.
I'm trying to run the COPY command in PostgreSQL-9.2 to add a tab delimited table from a .txt file to a PostgreSQL database such as:
COPY raw_data FROM '/home/Projects/TestData/raw_data.txt' WITH (DELIMITER ' ');
I've already created an empty table called "raw_data" in the database using the SQL command:
CREATE TABLE raw_data ();
I keep getting the following error message when trying to run the COPY
command:
ERROR: extra data after last expected column
CONTEXT: COPY raw_data, line 1: " 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 ..."
(The numbers here are supposed to be the column headings)
I'm not sure if its because I didn't specify table columns when creating the db table but I'm trying to avoid having to manually enter in 800 or columns.
Any suggestions on how to fix this?
Here's an example of what the .txt file looks like:
1 2 3 4 5 6 7 8 9
binary1 1 1 0 1 1 1 1 1 1
binary2 1 0 0 1 0 1 1 0 0
binary3 1 0 1 1 1 0 0 1 0
binary4 1 1 1 1 0 1 0 1 0
An empty table won't do. You need table that matches the structure of input data. Something like:
CREATE TABLE raw_data (
col1 int
, col2 int
...
);
You don't need to declare tab
as DELIMITER
since that's the default:
COPY raw_data FROM '/home/Projects/TestData/raw_data.txt';
800 columns you say? That many columns would typically indicate a problem with your design. Anyway, there are ways to half-automate the CREATE TABLE
script.
Assuming simplified raw data
1 2 3 4 -- first row contains "column names"
1 1 0 1 -- tab separated
1 0 0 1
1 0 1 1
Define a different DELIMITER
(one that does not occur in the import data at all), and import to a temporary staging table with a single text
column:
CREATE TEMP TABLE tmp_data (raw text);
COPY tmp_data FROM '/home/Projects/TestData/raw_data.txt' WITH (DELIMITER '§');
This query creates the CREATE TABLE
script:
SELECT 'CREATE TABLE tbl (col' || replace (raw, E'\t', ' bool, col') || ' bool)'
FROM (SELECT raw FROM tmp_data LIMIT 1) t;
A more generic & safer query:
SELECT 'CREATE TABLE tbl('
|| string_agg(quote_ident('col' || col), ' bool, ' ORDER BY ord)
|| ' bool);'
FROM (SELECT raw FROM tmp_data LIMIT 1) t
, unnest(string_to_array(t.raw, E'\t')) WITH ORDINALITY c(col, ord);
Returns:
CREATE TABLE tbl (col1 bool, col2 bool, col3 bool, col4 bool);
Execute after verifying validity - or execute dynamically if you trust the result:
DO
$$BEGIN
EXECUTE (
SELECT 'CREATE TABLE tbl (col' || replace(raw, ' ', ' bool, col') || ' bool)'
FROM (SELECT raw FROM tmp_data LIMIT 1) t
);
END$$;
Then INSERT
the data with this query:
INSERT INTO tbl
SELECT (('(' || replace(replace(replace(
raw
, '1', 't')
, '0', 'f')
, E'\t', ',')
|| ')')::tbl).*
FROM (SELECT raw FROM tmp_data OFFSET 1) t;
Or simpler with translate()
:
INSERT INTO tbl
SELECT (('(' || translate(raw, E'10\t', 'tf,') || ')')::tbl).*
FROM (SELECT raw FROM tmp_data OFFSET 1) t;
The string is converted into a row literal, cast to the newly created table row type and decomposed with (row).*
.
All done.
You could put all of that into a plpgsql function, but you'd need to safeguard against SQL injection. (There are a number of related solutions here on SO. Try a search.
db<>fiddle here
Old SQL Fiddle
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With