Following is the use case i am working on: I have configured <code>enable Streams</code> when creating <code>DynamoDB</code> with <code>new and old Image</code>.I have created a <code>Kinesis Firehose delivery stream</code> with Destination as <code>Redshift</code>(<code>Intermediate s3</code>). From Dynamodb my stream reaches Firhose and from there to the Bucket as JSON (S3 Bucket -Gzip)given below. My Problem is i <code>cannot COPY this JSON to redshift</code>. Things i am not able to get: <ul> <li><ol> <li>Not Sure what should be the Create table Statement in Redshift</li> </ol></li> <li><ol start="2"> <li>What should be the COPY Syntax in Kinesis firhose.</li> </ol></li> <li><ol start="3"> <li>How should i use JsonPaths here. Kinesis Data firehouse set to return only json to my s3 bucket.</li> </ol></li> <li><ol start="4"> <li>How to mention the Maniphest in the COPY Command </li> </ol></li> </ul> JSON Load to S3 is shown Below: <pre class="prettyprint"><code>{ "Keys": { "vehicle_id": { "S": "x011" } }, "NewImage": { "heart_beat": { "N": "0" }, "cdc_id": { "N": "456" }, "latitude": { "N": "1.30951" }, "not_deployed_counter": { "N": "1" }, "reg_ind": { "N": "0" }, "operator": { "S": "x" }, "d_dttm": { "S": "11/08/2018 2:43:46 PM" }, "z_id": { "N": "1267" }, "last_end_trip_dttm": { "S": "11/08/2018 1:43:46 PM" }, "land_ind": { "N": "1" }, "s_ind": { "N": "1" }, "status_change_dttm": { "S": "11/08/2018 2:43:46 PM" }, "case_ind": { "N": "1" }, "last_po_change_dttm": { "S": "11/08/2018 2:43:46 PM" }, "violated_duration": { "N": "20" }, "vehicle_id": { "S": "x011" }, "longitude": { "N": "103.7818" }, "file_status": { "S": "Trip_Start" }, "unhired_duration": { "N": "10" }, "eo_lat": { "N": "1.2345" }, "reply_eo_ind": { "N": "1" }, "license_ind": { "N": "0" }, "indiscriminately_parked_ind": { "N": "0" }, "eo_lng": { "N": "102.8978" }, "officer_id": { "S": "xxxx@gmail.com" }, "case_status": { "N": "0" }, "color_status_cd": { "N": "0" }, "parking_id": { "N": "2345" }, "ttr_dttm": { "S": "11/08/2018 2:43:46 PM" }, "deployed_ind": { "N": "1" }, "status": { "S": "PI" } }, "SequenceNumber": "1200000000000956615967", "SizeBytes": 570, "ApproximateCreationDateTime": 1535513040, "eventName": "INSERT" } </code></pre> My Create table Statement : <pre class="prettyprint"><code>create table vehicle_status( heart_beat integer, cdc_id integer, latitude integer, not_deployed_counter integer, reg_ind integer, operator varchar(10), d_dttm varchar(30), z_id integer, last_end_trip_dttm varchar(30), land_ind integer, s_ind integer, status_change_dttm varchar(30), case_ind integer, last_po_change_dttm varchar(30), violated_duration integer, vehicle_id varchar(8), longitude integer, file_status varchar(30), unhired_duration integer, eo_lat integer, reply_eo_ind integer, license_ind integer, indiscriminately_parked_ind integer, eo_lng integer, officer_id varchar(50), case_status integer, color_status_cd integer, parking_id integer, ttr_dttm varchar(30), deployed_ind varchar(3), status varchar(8)); </code></pre> And My Copy Statement (Manually trying to reslove this from Redshift): <pre class="prettyprint"><code>COPY vehicle_status (heart_beat, cdc_id, latitude, not_deployed_counter, reg_ind, operator, d_dttm, z_id, last_end_trip_dttm, land_ind, s_ind, status_change_dttm, case_ind, last_po_change_dttm, violated_duration, vehicle_id, longitude, file_status, unhired_duration, eo_lat, reply_eo_ind, license_ind, indiscriminately_parked_ind, eo_lng, officer_id, case_status, color_status_cd, parking_id, ttr_dttm, deployed_ind, status) FROM 's3://<my-bucket>/2018/08/29/05/vehicle_status_change-2-2018-08-29-05-24-42-092c330b-e14a-4133-bf4a-5982f2e1f49e.gz' CREDENTIALS 'aws_iam_role=arn:aws:iam::<accountnum>:role/<RedshiftRole>' GZIP json 'auto'; </code></pre> When i try the above procedure - i get to Insert the records - but all the columns and rows are null. How can i copy this json format to redhsift. Have been stuck here last 3 days.Any help on this would do. S3 Bucket: <pre class="prettyprint"><code>Amazon S3/<My-bucket>/2018/08/29/05 Amazon S3/<My-bucket>/manifests/2018/08/29/05 </code></pre>

I'm not very much familiar with Amazon, but let me try to answer most of your questions, so that you could move on. Other people are most welcome to edit this answer or additional details. Thank you! <blockquote> Not Sure what should be the Create table Statement in Redshift </blockquote> Your create statement <code>create table vehicle_status(...)</code> has no problem, though you could add <code>distribution key</code>, <code>sort key</code> and <code>encoding</code> based on your requirement, refer more here and here As per AWS Kenesis documents, your table must be present in <code>Redshift</code>, hence you could connect to <code>Redshift</code> using <code>psql</code> command and run the <code>create statement</code> manually. <blockquote> What should be the COPY Syntax in Kinesis firhose. </blockquote> The <code>Copy</code> syntax would remain same either you run it via <code>psql</code> or <code>firhose</code>, luckily the copy script you have come up with works without any error, I tried it in my instance with small modification of direct <code>AWS/SECRET</code> key supply rather then it works fine, here the <code>sql</code> I run that worked fine and copied 1 data record to the table <code>vehicle_status</code>. Actually your json path structure is complex, hence <code>json 'auto'</code> will not work. Here is the working command, I have created a sample <code>jsonpath</code> file for you with 4 example fields and you could follow same structure to create <code>jsonpath</code> file with all the data points. <pre class="prettyprint"><code> COPY vehicle_status (heart_beat, cdc_id, operator, status) FROM 's3://XXX/development/test_file.json' CREDENTIALS 'aws_access_key_id=XXXXXXXXXXXXXXXXX;aws_secret_access_key=MYXXXXXXXXXXXXXXXXXXXXXX' json 's3://XXX/development/yourjsonpathfile'; </code></pre> And your <code>json path file</code> should have content similar of as below. <pre class="prettyprint"><code>{ "jsonpaths": [ "$['NewImage']['heart_beat']['N']", "$['NewImage']['cdc_id']['N']", "$['NewImage']['operator']['S']", "$['NewImage']['status']['S']" ] } </code></pre> I have tested it and it works. <blockquote> How should i use JsonPaths here. Kinesis Data firehouse set to return only json to my s3 bucket. </blockquote> I have used your example <code>json</code> data only and it works, so I see no issue here. <blockquote> How to mention the Maniphest in the COPY Command </blockquote> This is good question, I could try explaining it, I hope, here you are referring <code>menifest</code>. If you see above copy command, it works fine for one file or couple of files, but think it you have lot of files, here comes the concept of <code>menifest</code>. Straight from Amazon docs, "Instead of supplying an object path for the COPY command, you supply the name of a JSON-formatted text file that explicitly lists the files to be loaded." In short, if you want to load multiple files in single shot which is preferred way as well by <code>Redshift</code>, you could create a simple <code>menifest</code> with json and supply the same in copy command. <code>{ "entries": [ {"url":"s3://mybucket-alpha/2013-10-04-custdata", "mandatory":true}, {"url":"s3://mybucket-alpha/2013-10-05-custdata", "mandatory":true},.... ] }</code> upload the menifest to <code>S3</code> and use the same in your copy command like below. <pre class="prettyprint"><code> COPY vehicle_status (heart_beat, cdc_id, latitude, not_deployed_counter, reg_ind, operator, d_dttm, z_id, last_end_trip_dttm, land_ind, s_ind, status_change_dttm, case_ind, last_po_change_dttm, violated_duration, vehicle_id, longitude, file_status, unhired_duration, eo_lat, reply_eo_ind, license_ind, indiscriminately_parked_ind, eo_lng, officer_id, case_status, color_status_cd, parking_id, ttr_dttm, deployed_ind, status) FROM 's3://XXX/development/test.menifest' CREDENTIALS 'aws_access_key_id=XXXXXXXXXXXXXXXXX;aws_secret_access_key=MYXXXXXXXXXXXXXXXXXXXXXX' json 's3://yourbucket/jsonpath' menifest; </code></pre> Here is detail reference for menifest. I hope this gives you some ideas, how to move on and If there is spefic error you see, I would be happy to refocus on the answer.

cannot copy json - Dynamo db Streams to redshift

Tags:

amazon-s3

amazon-dynamodb

amazon-kinesis-firehose

amazon-redshift

amazon-dynamodb-streams

Following is the use case i am working on: I have configured enable Streams when creating DynamoDB with new and old Image.I have created a Kinesis Firehose delivery stream with Destination as Redshift(Intermediate s3).

From Dynamodb my stream reaches Firhose and from there to the Bucket as JSON (S3 Bucket -Gzip)given below. My Problem is i cannot COPY this JSON to redshift.

Things i am not able to get:

1. Not Sure what should be the Create table Statement in Redshift
1. What should be the COPY Syntax in Kinesis firhose.
1. How should i use JsonPaths here. Kinesis Data firehouse set to return only json to my s3 bucket.
1. How to mention the Maniphest in the COPY Command

JSON Load to S3 is shown Below:

Click to copy

{
    "Keys": {
        "vehicle_id": {
            "S": "x011"
        }
    },
    "NewImage": {
        "heart_beat": {
            "N": "0"
        },
        "cdc_id": {
            "N": "456"
        },
        "latitude": {
            "N": "1.30951"
        },
        "not_deployed_counter": {
            "N": "1"
        },
        "reg_ind": {
            "N": "0"
        },
        "operator": {
            "S": "x"
        },
        "d_dttm": {
            "S": "11/08/2018 2:43:46 PM"
        },
        "z_id": {
            "N": "1267"
        },
        "last_end_trip_dttm": {
            "S": "11/08/2018 1:43:46 PM"
        },
        "land_ind": {
            "N": "1"
        },
        "s_ind": {
            "N": "1"
        },
        "status_change_dttm": {
            "S": "11/08/2018 2:43:46 PM"
        },
        "case_ind": {
            "N": "1"
        },
        "last_po_change_dttm": {
            "S": "11/08/2018 2:43:46 PM"
        },
        "violated_duration": {
            "N": "20"
        },
        "vehicle_id": {
            "S": "x011"
        },
        "longitude": {
            "N": "103.7818"
        },
        "file_status": {
            "S": "Trip_Start"
        },
        "unhired_duration": {
            "N": "10"
        },
        "eo_lat": {
            "N": "1.2345"
        },
        "reply_eo_ind": {
            "N": "1"
        },
        "license_ind": {
            "N": "0"
        },
        "indiscriminately_parked_ind": {
            "N": "0"
        },
        "eo_lng": {
            "N": "102.8978"
        },
        "officer_id": {
            "S": "xxxx@gmail.com"
        },
        "case_status": {
            "N": "0"
        },
        "color_status_cd": {
            "N": "0"
        },
        "parking_id": {
            "N": "2345"
        },
        "ttr_dttm": {
            "S": "11/08/2018 2:43:46 PM"
        },
        "deployed_ind": {
            "N": "1"
        },
        "status": {
            "S": "PI"
        }
    },
    "SequenceNumber": "1200000000000956615967",
    "SizeBytes": 570,
    "ApproximateCreationDateTime": 1535513040,
    "eventName": "INSERT"
}

My Create table Statement :

Click to copy

create table vehicle_status(
    heart_beat integer,
    cdc_id integer,
    latitude integer,   
    not_deployed_counter integer,
    reg_ind integer,
    operator varchar(10),
    d_dttm varchar(30),
    z_id integer,
    last_end_trip_dttm varchar(30),
    land_ind integer,
    s_ind integer,
    status_change_dttm varchar(30), 
    case_ind integer,
    last_po_change_dttm varchar(30),    
    violated_duration integer,
    vehicle_id varchar(8),
    longitude integer,  
    file_status varchar(30),
    unhired_duration integer,
    eo_lat integer,                     
    reply_eo_ind integer,
    license_ind integer,    
    indiscriminately_parked_ind integer,
    eo_lng integer,
    officer_id varchar(50),
    case_status integer,
    color_status_cd integer,
    parking_id integer,
    ttr_dttm varchar(30),
    deployed_ind varchar(3),
  status varchar(8));

And My Copy Statement (Manually trying to reslove this from Redshift):

Click to copy

COPY vehicle_status (heart_beat, cdc_id, latitude, not_deployed_counter, reg_ind, operator, d_dttm, z_id, last_end_trip_dttm, land_ind, s_ind, status_change_dttm, case_ind, last_po_change_dttm, violated_duration, vehicle_id, longitude, file_status, unhired_duration, eo_lat, reply_eo_ind, license_ind, indiscriminately_parked_ind, eo_lng, officer_id, case_status, color_status_cd, parking_id, ttr_dttm, deployed_ind, status) 
FROM 's3://<my-bucket>/2018/08/29/05/vehicle_status_change-2-2018-08-29-05-24-42-092c330b-e14a-4133-bf4a-5982f2e1f49e.gz' CREDENTIALS 'aws_iam_role=arn:aws:iam::<accountnum>:role/<RedshiftRole>' GZIP json 'auto';

When i try the above procedure - i get to Insert the records - but all the columns and rows are null.

How can i copy this json format to redhsift. Have been stuck here last 3 days.Any help on this would do.

S3 Bucket:

Click to copy

Amazon S3/<My-bucket>/2018/08/29/05
Amazon S3/<My-bucket>/manifests/2018/08/29/05

958

asked Aug 29 '18 07:08

Prasanna Nandakumar

1 Answers

I'm not very much familiar with Amazon, but let me try to answer most of your questions, so that you could move on. Other people are most welcome to edit this answer or additional details. Thank you!

Not Sure what should be the Create table Statement in Redshift

Your create statement create table vehicle_status(...) has no problem, though you could add distribution key, sort key and encoding based on your requirement, refer more here and here

As per AWS Kenesis documents, your table must be present in Redshift, hence you could connect to Redshift using psql command and run the create statement manually.

What should be the COPY Syntax in Kinesis firhose.

The Copy syntax would remain same either you run it via psql or firhose, luckily the copy script you have come up with works without any error, I tried it in my instance with small modification of direct AWS/SECRET key supply rather then it works fine, here the sql I run that worked fine and copied 1 data record to the table vehicle_status.

Actually your json path structure is complex, hence json 'auto' will not work. Here is the working command, I have created a sample jsonpath file for you with 4 example fields and you could follow same structure to create jsonpath file with all the data points.

Click to copy

 COPY vehicle_status (heart_beat, cdc_id, operator, status) FROM 's3://XXX/development/test_file.json' CREDENTIALS 'aws_access_key_id=XXXXXXXXXXXXXXXXX;aws_secret_access_key=MYXXXXXXXXXXXXXXXXXXXXXX' json 's3://XXX/development/yourjsonpathfile';

And your json path file should have content similar of as below.

Click to copy

{
  "jsonpaths": [
    "$['NewImage']['heart_beat']['N']",
    "$['NewImage']['cdc_id']['N']",
    "$['NewImage']['operator']['S']",
    "$['NewImage']['status']['S']"
  ]
}

I have tested it and it works.

How should i use JsonPaths here. Kinesis Data firehouse set to return only json to my s3 bucket.

I have used your example json data only and it works, so I see no issue here.

How to mention the Maniphest in the COPY Command

This is good question, I could try explaining it, I hope, here you are referring menifest.

If you see above copy command, it works fine for one file or couple of files, but think it you have lot of files, here comes the concept of menifest. Straight from Amazon docs, "Instead of supplying an object path for the COPY command, you supply the name of a JSON-formatted text file that explicitly lists the files to be loaded."

In short, if you want to load multiple files in single shot which is preferred way as well by Redshift, you could create a simple menifest with json and supply the same in copy command.

{ "entries": [ {"url":"s3://mybucket-alpha/2013-10-04-custdata", "mandatory":true}, {"url":"s3://mybucket-alpha/2013-10-05-custdata", "mandatory":true},.... ] }

upload the menifest to S3 and use the same in your copy command like below.

Click to copy

 COPY vehicle_status (heart_beat, cdc_id, latitude, not_deployed_counter, reg_ind, operator, d_dttm, z_id, last_end_trip_dttm, land_ind, s_ind, status_change_dttm, case_ind, last_po_change_dttm, violated_duration, vehicle_id, longitude, file_status, unhired_duration, eo_lat, reply_eo_ind, license_ind, indiscriminately_parked_ind, eo_lng, officer_id, case_status, color_status_cd, parking_id, ttr_dttm, deployed_ind, status) FROM 's3://XXX/development/test.menifest' CREDENTIALS 'aws_access_key_id=XXXXXXXXXXXXXXXXX;aws_secret_access_key=MYXXXXXXXXXXXXXXXXXXXXXX' json 's3://yourbucket/jsonpath' menifest;

Here is detail reference for menifest.

I hope this gives you some ideas, how to move on and If there is spefic error you see, I would be happy to refocus on the answer.

answered Sep 18 '22 13:09

Red Boy

Related questions
                            
                                Getting file size from S3 bucket
                            
                                Invalid Bucket name when creating s3 bucket with AWS SDK
                            
                                How to download files from s3 service to local folder
                            
                                Upload image AWS S3 bucket in swift
                            
                                s3fs unmount: directory is not empty
                            
                                AWS S3 php api : How to get url after file upload?
                            
                                How can I avoid "Networking Error: Network Failure" when downloading a file from AWS S3 bucket?
                            
                                Why is AWS telling me BucketAlreadyExists when it doesn't?
                            
                                model missing required attr_accessor for 'photo_file_name' when uploading with paperclip and S3 on heroku
                            
                                How to mark a certain S3 file as Make Public
                            
                                How to get the file diff between two S3 buckets?
                            
                                AWS - How to install java11 on an EC2 Linux machine?
                            
                                How to setup Filezilla Pro to access to my s3 bucket?
                            
                                Rails implementation for securing S3 documents
                            
                                Can I get a user's s3 canonical ID from the command line?
                            
                                Send heroku logs to s3
                            
                                Trouble Transferring data from FTP server to S3 via stream using Python
                            
                                Restrict S3 Object Access to Specific Cognito Users

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

cannot copy json - Dynamo db Streams to redshift

Tags:

amazon-s3

amazon-dynamodb

amazon-kinesis-firehose

amazon-redshift

amazon-dynamodb-streams

Prasanna Nandakumar

People also ask

1 Answers

Red Boy

Recent Activity

Donate For Us