Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

in redshift postgresql can I skip columns with the copy function

  • I have a .csv table (t1) with columns: c1, c2, c3 in amazon S3 storage
  • I want to copy that into amazon redshift
  • I create the table with columns: c1, c2, c3 where all columns are nullable
  • I copy with command:

    copy t1a (c1,c3) from t1

  • I expected it would copy c1 and c3 over from t1 and place the default null value in c2 so a row in t1a might look like (c1_rowX, null, c3_rowX).

  • Instead I get a type error because it's coping c2 (string type) data from t1 into c3 (int type) of t1a.

  • the copy command works fine when I don't specify the columns:

    copy t1a from t1

  • I've included a link to the redshift copy command documentation:

http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

  • main question is there an issue with my use of specifying columns. thanks
like image 348
Elm Avatar asked Jun 11 '13 23:06

Elm


People also ask

What does COPY command do in Redshift?

The COPY command appends the new input data to any existing rows in the table. The maximum size of a single input row from any source is 4 MB. To use the COPY command, you must have INSERT privilege for the Amazon Redshift table.

Which format is not supported by COPY command?

Snowflake file format - ' ' is not recognized with Copy Into command.

What is the most efficient and fastest way to load data into Redshift?

A COPY command is the most efficient way to load a table. You can also add data to your tables using INSERT commands, though it is much less efficient than using COPY. The COPY command is able to read from multiple data files or multiple data streams simultaneously.


2 Answers

If you want to skip preprocessing part, you can define the column to be skipped as CHAR(1) and then use a TRUNCATECOLUMNS parameter to a COPY command:

CREATE TABLE t1a (
  c1,
  c2 CHAR(1),
  c3
);
COPY t1a FROM t1 TRUNCATECOLUMNS

The TRUNCATECOLUMNS ignores all data that is longer than defined in a table schema during the import, so all data in that column will be truncated into 1 character.

That's just a hack, preprocessing input file is recommended, but sometimes a hack is all that's needed.

like image 162
Tomasz Tybulewicz Avatar answered Sep 23 '22 04:09

Tomasz Tybulewicz


Detailed column-mappings are not supported in COPY directly, unless Amazon/ParAccel have added something extra to their PostgreSQL fork that isn't in mainline.

What you typically do in this case is have a script massage/pad the data - read the CSV, transform it as desired, then send the line through the PostgreSQL connection to a COPY ... FROM STDIN command.

Alternately, you might COPY the whole original CSV into a table then transform that data with an INSERT INTO ... SELECT to the real target table.

like image 34
Craig Ringer Avatar answered Sep 20 '22 04:09

Craig Ringer