Loading JSON data to AWS Redshift results in NULL values

Tags:

I am trying to perform a load/copy operation to import data from JSON files in an S3 bucket directly to Redshift. The COPY operation succeeds, and after the COPY, the table has the correct number of rows/records, but every record is NULL !

It takes the expected amount of time for the load, the COPY command returns OK, the Redshift console reports successful and no errors... but if I perform a simple query from the table, it returns only NULL values.

The JSON is very simple + flat, and formatted correctly (according to examples I found here: http://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html)

Basically, it is one row per line, formatted like:

{ "col1": "val1", "col2": "val2", ... }
{ "col1": "val1", "col2": "val2", ... }
{ "col1": "val1", "col2": "val2", ... }

I have tried things like rewriting the schema based on values and data types found in the JSON objects and also copying from uncompressed files. I thought perhaps the JSON was not being parsed correctly upon load, but it should presumably raise an error if the objects cannot be parsed.

My COPY command looks like this:

copy events from 's3://mybucket/json/prefix' 
with credentials 'aws_access_key_id=xxx;aws_secret_access_key=xxx'
json 'auto' gzip;

Any guidance would be appreciated! Thanks.

675

asked Jun 30 '15 01:06

shane

1 Answers

So I have discovered the cause - This would not have been evident from the description I provided in my original post.

When you create a table in Redshift, the column names are converted to lowercase. When you perform a COPY operation, the column names are case sensitive.

The input data that I have been trying to load is using camelCase for column names, and so when I perform the COPY, the columns do not match up with the defined schema (which now uses all lowercase column names)

The operation does not raise an error, though. It just leaves NULLs in all the columns that did not match (in this case, all of them)

Hope this helps somebody to avoid the same confusion!

167

answered Sep 21 '22 09:09

shane

Related questions
                            
                                Renaming a file with Amazon S3 PHP SDK
                            
                                How to select a file from aws s3 by using wild character
                            
                                Uploading Base64 encoded image to Amazon s3 using java
                            
                                AWS Lambda: Is it secure to store data on AWS Lambda local Disk?
                            
                                AWS Lambda not importing LXML
                            
                                AWS Certificate Manager - Do regions matter?
                            
                                Delete a folder and its content AWS S3 java
                            
                                Amazon SimpleDB
                            
                                Node.JS Response Time
                            
                                Can I set the timezone for reports in Amazon Cloudwatch?
                            
                                Is it possible to copy between AWS accounts using AWS CLI?
                            
                                AWS Lambda S3 Bucket Notification via CloudFormation
                            
                                How to insert json in dynamodb
                            
                                AWS S3 + CloudFront gives CORS errors when serving images from browser cache
                            
                                How to add a RDS instance to a VPC using aws cloudformation
                            
                                How to deny all outbound traffic from an AWS EC2 Instance using a Security Group?
                            
                                How to schedule tasks on SageMaker
                            
                                Connect to AWS ElastiCache with In-Transit Encryption + Auth from client other than redis-cli+stunnel
                            
                                How do I get the most recent Cloudwatch metric data for an instance using Boto?
                            
                                Specifying an external configuration file for Apache Spark

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Loading JSON data to AWS Redshift results in NULL values

Tags:

amazon-web-services

amazon-redshift

shane

People also ask

1 Answers

shane

Recent Activity

Donate For Us