Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should records be formatted for AWS Kinesis Firehose to Redshift?

Consider the following:

  • A table in Redshift called 'people' that has fields id, name and age
  • A kinesis firehose stream called 'people' that is configured to write to the 'people' table and the value for 'Redshift table columns' is 'id,name,age'

It's not clear how to format the 'Data' blob. Here's an example of what the code looks like with the data separated by tabs:

let AWS = require('aws-sdk');
let firehose = new AWS.Firehose();
let params = {
  DeliveryStreamName: 'people',
  // id,name,age
  Records: [{Data: '4ccf6d3a-acdf-11e5-ad54-28cfe91fa8f1\tBob\tSmith'}]
};
firehose.putRecordBatch(params, (err, result) => {
  console.log(err || result);
});

Here are some of the docs I have checked:

  • http://docs.aws.amazon.com/firehose/latest/APIReference/API_PutRecordBatch.html
  • http://docs.aws.amazon.com/firehose/latest/dev/writing-with-sdk.html
  • http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Firehose.html#putRecordBatch-property
like image 496
Jamie McCrindle Avatar asked Dec 27 '15 21:12

Jamie McCrindle


People also ask

Which data formats are supported for Firehose?

Amazon Kinesis Data Firehose can convert the format of your input data from JSON to Apache Parquet or Apache ORC before storing the data in Amazon S3. Parquet and ORC are columnar data formats that save space and enable faster queries compared to row-oriented formats like JSON.

Can Kinesis firehose write to redshift?

Kinesis Data Firehose delivers your data to your S3 bucket first and then issues an Amazon Redshift COPY command to load the data into your Amazon Redshift cluster. Specify an S3 bucket that you own where the streaming data should be delivered.

Can Firehose create a delivery stream to redshift?

Firehose automatically delivers the data to the Amazon S3 bucket or Amazon Redshift table that you specify in the delivery stream. For information about supported versions, see Supported Systems and Versions. To write data to Amazon Kinesis Streams, use the Kinesis Producer destination.


2 Answers

The answer is here:

http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html

Fields need to be pipe '|' separated by default. Rows should be separated by new lines.

Updated corrected code:

let AWS = require('aws-sdk');
let firehose = new AWS.Firehose();
let params = {
  DeliveryStreamName: 'people',
  // id,name,age
  Records: [{Data: '4ccf6d3a-acdf-11e5-ad54-28cfe91fa8f1|Bob|Smith\n'}]
};
firehose.putRecordBatch(params, (err, result) => {
  console.log(err || result);
});
like image 175
Jamie McCrindle Avatar answered Nov 05 '22 01:11

Jamie McCrindle


You can also send straight JSON as long as you properly escape things and use the right COPY options. See COPY FROM JSON and JSON AS. If you use the 'auto' option for JSON AS then it's pretty flexible. If you send multiple dictionaries in one putRecordBatch, do NOT separate them with space or newlines, you can just send

{ key1: "value1", key2: 34 }{ key1: "value2", key2: 45 }
like image 3
Joel Avatar answered Nov 05 '22 02:11

Joel