I have a column with a default constraint in my Redshift table so that the current timestamp will be populated for it. <pre class="prettyprint"><code>CREATE TABLE test_table( ... etl_date_time timestamp DEFAULT GETDATE(), ... ); </code></pre> This works as expected on INSERTS, but I still get null values when copying a json file from S3 that has no key for this column <pre class="prettyprint"><code>COPY test_table FROM 's3://bucket/test_file.json' CREDENTIALS '...' FORMAT AS JSON 'auto'; // There shouldn't be any NULLs here, but there are select count(*) from test_table where etl_date_time is null; </code></pre> I have also tried putting a null value for the key in the source JSON, but that resulted in NULL values in the table as well. <pre class="prettyprint"><code>{ ... "etl_date_time": null, ... } </code></pre>

If the field is always <code>NULL</code>, consider omitting it from the files at S3 at all. <code>COPY</code> let's you specify the columns you intend to copy and will populate missing ones with their <code>DEFAULT</code> values. So for the file <code>data.json</code>: <pre class="prettyprint"><code>{"col1":"r1_val1", "col3":"r1_val2"} {"col1":"r2_val1", "col3":"r2_val2"} </code></pre> And the table definition: <pre class="prettyprint"><code>create table _test ( col1 varchar(20) , col2 timestamp default getdate() , col3 varchar(20) ); </code></pre> <h3>Specific column names</h3> The <code>COPY</code> command with explicit column names <pre class="prettyprint"><code>copy _test(col1,col3) from 's3://bucket/data.json' format as json 'auto' </code></pre> Would yield the following result: <pre class="prettyprint"><code>db=# select * from _test; col1 | col2 | col3 ---------+---------------------+--------- r1_val1 | 2016-07-27 18:27:08 | r1_val2 r2_val1 | 2016-07-27 18:27:08 | r2_val2 (2 rows) </code></pre> <h3>Omitted column names</h3> If the column names are omitted, <pre class="prettyprint"><code>copy _test from 's3://bucket/data.json' format as json 'auto' </code></pre> Would never use the <code>DEFAULT</code> but insert <code>NULL</code> instead: <pre class="prettyprint"><code>db=# select * from _test; col1 | col2 | col3 ---------+---------------------+--------- r1_val1 | | r1_val2 r2_val1 | | r2_val2 (2 rows) </code></pre>

Redshift DEFAULT GETDATE() working on INSERT but not COPY

Tags:

amazon-web-services

amazon-redshift

I have a column with a default constraint in my Redshift table so that the current timestamp will be populated for it.

CREATE TABLE test_table(
    ...
    etl_date_time timestamp DEFAULT GETDATE(),
    ...
);

This works as expected on INSERTS, but I still get null values when copying a json file from S3 that has no key for this column

COPY test_table FROM 's3://bucket/test_file.json' 
CREDENTIALS '...' FORMAT AS JSON 'auto';

// There shouldn't be any NULLs here, but there are
select count(*) from test_table where etl_date_time is null;

I have also tried putting a null value for the key in the source JSON, but that resulted in NULL values in the table as well.

{
    ...
    "etl_date_time": null,
    ...
}

638

asked Jul 27 '16 17:07

csab

1 Answers

If the field is always NULL, consider omitting it from the files at S3 at all. COPY let's you specify the columns you intend to copy and will populate missing ones with their DEFAULT values.

So for the file data.json:

{"col1":"r1_val1", "col3":"r1_val2"}
{"col1":"r2_val1", "col3":"r2_val2"}

And the table definition:

create table _test (
    col1 varchar(20)
  , col2 timestamp default getdate()
  , col3 varchar(20)
);

Specific column names

The COPY command with explicit column names

copy _test(col1,col3) from 's3://bucket/data.json' format as json 'auto'

Would yield the following result:

db=# select * from _test;
  col1   |        col2         |  col3
---------+---------------------+---------
 r1_val1 | 2016-07-27 18:27:08 | r1_val2
 r2_val1 | 2016-07-27 18:27:08 | r2_val2
(2 rows)

Omitted column names

If the column names are omitted,

copy _test from 's3://bucket/data.json' format as json 'auto'

Would never use the DEFAULT but insert NULL instead:

db=# select * from _test;
  col1   |        col2         |  col3
---------+---------------------+---------
 r1_val1 |                     | r1_val2
 r2_val1 |                     | r2_val2
(2 rows)

195

answered Sep 22 '22 23:09

moertel

Related questions
                            
                                How to use ajax GET or POST method to pass data to Amazon lambda node.js function
                            
                                Linking Amazon Route 53 Domain Name to EC2 instance
                            
                                High level instructions for migrating application from Heroku to AWS?
                            
                                Scheduling long-running tasks using AWS services
                            
                                Python Requests POST not working
                            
                                Rails direct to S3 upload using aws-sdk gem and jQuery-File-Upload on heroku
                            
                                How to ensure to update Docker image on AWS ECS?
                            
                                Is it safe to use a link to an image in my AWS S3 bucket on my webpage?
                            
                                AWS Cognito, Lambda, User credentials in DynamoDB
                            
                                AWS SNS is bypassing API Gateway and calling Lamba functions directly
                            
                                vagrant on windows error 'cannot load such file' nokogiri
                            
                                aws cli s3 sync, exclude not working
                            
                                Getting emr-ddb-hadoop.jar to connect DynamoDB with EMR Spark
                            
                                AWS User Pool Setup (Swift)
                            
                                User Pool and Federated Identity
                            
                                Using API Gateway to publish SNS topics / multiple lambda function with API Gateway
                            
                                Connect to AWS MySQL database via Node JS
                            
                                What's the purpose of the Metadata section in a CloudFormation Template?
                            
                                Use Prometheus operator with DB volume for k8s
                            
                                Could not parse request body into json: Unexpected character (\'-\' (code 45)) AWS Lambda + API + Postman

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With