Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make Postgres Copy ignore first line of large txt file

I have a fairly large .txt file ~9gb and I will like to load this txt file into postgres. The first row is the header, followed by all the data. If I postgres COPY the data directly, the header will cause an error that data type do not match with my postgres table, so I will need to remove it somehow.

Sample data: ProjectId,MailId,MailCodeId,prospectid,listid,datemailed,amount,donated,zip,zip4,VectorMajor,VectorMinor,packageid,phase,databaseid,amount2

15,53568419,89734,219906,15,2011-05-11 00:00:00,0,0,90720,2915,NonProfit,POLICY,230,3,1,0  16,84141863,87936,164657,243,2011-03-10 00:00:00,0,0,48362,2523,NonProfit,POLICY,1507,5,1,0  16,81442028,86632,15181625,243,2011-01-19 00:00:00,0,0,11501,2115,NonProfit,POLICY,1508,2,1,0 

While the COPY function for postgres has the "header" setting that can ignore the first row, it only works for csv files:

copy training from 'C:/testCSV.csv' DELIMITER ',' csv header; 

when I try to run the code above on my txt file, it gets an error:

copy training from 'C:/testTXTFile.txt' DELIMITER ',' csv header ERROR:  unquoted newline found in data HINT:  Use quoted CSV field to represent newline. 

I have tried adding "quote" and "escape" attributes but the command just won't seem to work for txt file:

copy training from 'C:/testTXTFile.txt' DELIMITER ',' csv header quote as E'"'  escape as E'\\N'; ERROR:  COPY escape must be a single one-byte character 

Alternatively, I thought about running java or create a seperate stagging table to remove the first row...but these solutions are expansive and time consuming. I will need to load 9gb of data just to remove the first row of headers... are there other solutions out there to remove the first row of a txt file easily so that I can load the data into my postgres database?

like image 428
thiakx Avatar asked Aug 02 '12 09:08

thiakx


People also ask

Does Postgres copy overwrite?

If you COPY data into a table already containing data, the new data will be appended. If you COPY TO a file already containing data, the existing data will be overwritten.

What is PSQL's \copy?

Psql \copy command is used when you want to export the data from Postgres table to a CSV file on a client machine. To use this command, you will need access to the psql prompt. You will understand it more with the following psql copy examples. To copy the entire table to a csv file, use \copy.

When loading data into a postgresql table what is the default delimiter for data files?

The default is a tab character in text format, a comma in CSV format. This must be a single one-byte character.


1 Answers

Use HEADER option with CSV option:

\copy <table_name>  from '/source_file.csv' delimiter ',' CSV HEADER ; 

HEADER Specifies that the file contains a header line with the names of each column in the file. On output, the first line contains the column names from the table, and on input, the first line is ignored. This option is allowed only when using CSV format.

like image 152
Edmon Avatar answered Sep 29 '22 04:09

Edmon