I have a column with a default constraint in my Redshift table so that the current timestamp will be populated for it.
CREATE TABLE test_table(
...
etl_date_time timestamp DEFAULT GETDATE(),
...
);
This works as expected on INSERTS, but I still get null values when copying a json file from S3 that has no key for this column
COPY test_table FROM 's3://bucket/test_file.json'
CREDENTIALS '...' FORMAT AS JSON 'auto';
// There shouldn't be any NULLs here, but there are
select count(*) from test_table where etl_date_time is null;
I have also tried putting a null value for the key in the source JSON, but that resulted in NULL values in the table as well.
{
...
"etl_date_time": null,
...
}
CURRENT_DATE returns a date in the current session time zone (UTC by default) in the default format: YYYY-MM-DD. CURRENT_DATE returns the start date for the current transaction, not for the start of the current statement.
The COPY command is an extension of SQL supported by Redshift. Therefore, the COPY command needs to be issued from an SQL client. You mention that you have configured SQL Workbench. Once you connect to the Redshift cluster, run the command from within that connection.
Amazon Redshift Spectrum external tables are read-only. You can't COPY to an external table. The COPY command appends the new input data to any existing rows in the table.
If the field is always NULL
, consider omitting it from the files at S3 at all. COPY
let's you specify the columns you intend to copy and will populate missing ones with their DEFAULT
values.
So for the file data.json
:
{"col1":"r1_val1", "col3":"r1_val2"}
{"col1":"r2_val1", "col3":"r2_val2"}
And the table definition:
create table _test (
col1 varchar(20)
, col2 timestamp default getdate()
, col3 varchar(20)
);
The COPY
command with explicit column names
copy _test(col1,col3) from 's3://bucket/data.json' format as json 'auto'
Would yield the following result:
db=# select * from _test;
col1 | col2 | col3
---------+---------------------+---------
r1_val1 | 2016-07-27 18:27:08 | r1_val2
r2_val1 | 2016-07-27 18:27:08 | r2_val2
(2 rows)
If the column names are omitted,
copy _test from 's3://bucket/data.json' format as json 'auto'
Would never use the DEFAULT
but insert NULL
instead:
db=# select * from _test;
col1 | col2 | col3
---------+---------------------+---------
r1_val1 | | r1_val2
r2_val1 | | r2_val2
(2 rows)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With