Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS Kinesis Firehose not inserting data in Redshift

I try to have a Kinesis Firehose pushing data in a Redshift table.

The firehose stream is working and putting data in S3.

But nothing arrive in the destination table in Redshift.

  • In the metrics DeliveryToRedshift Success is 0 (DeliveryToRedshift Records is empty)
  • The load logs (redshift web console) and STL_LOAD_ERRORS table are empty.
  • I checked that Firehose is able to connect to Redshift (I see the connections in STL_CONNECTION_LOG)

How can I troubleshoot this ?

like image 785
mathieu Avatar asked Dec 10 '15 16:12

mathieu


People also ask

Can Kinesis firehose write to Redshift?

Kinesis Data Firehose delivers your data to your S3 bucket first and then issues an Amazon Redshift COPY command to load the data into your Amazon Redshift cluster. Specify an S3 bucket that you own where the streaming data should be delivered. Create a new S3 bucket, or choose an existing bucket that you own.

Can Kinesis data streams write to Redshift?

Redshift is one such destination supported by Kinesis and data can be streamed from Kinesis to Redshift. Amazon's Redshift is a fully managed cloud-based data warehouse service from the Amazon Web Services(AWS) family. It is designed to store petabytes of data in its data warehouse storage.

What is the maximum size Kinesis data firehose record can have?

Q: What is a record in Kinesis Data Firehose? A record is the data of interest your data producer sends to a delivery stream. The maximum size of a record (before Base64-encoding) is 1024 KB.


2 Answers

In the end, I made it work by deleting and re-creating the Firehose stream :-/ Probably the repeated edits via the web console made the thing unstable.

But here are troubleshooting guidelines :

  • A good start point is this procedure : http://docs.aws.amazon.com/firehose/latest/dev/troubleshooting.html
  • Check that data is arriving in S3
    • There must be an IAM role for firehose delivery, with a trust relationship between the firehose service and this role
    • This IAM role must have S3 access policy
    • See the policy jsons here : http://docs.aws.amazon.com/firehose/latest/dev/controlling-access.html#using-iam-s3
  • At this point, the "DeliveryToS3 Success" metric in Firehose monitoring should be non-zero
  • The Redshift cluster must be publicly accessible (see cluster web console)
  • The security group of the cluster must allow inbound traffic from Firehose IP addresses : Firehose currently uses one CIDR block for each available AWS Region:
    • 52.70.63.192/27 for US East (N. Virginia)
    • 52.89.255.224/27 for US West (Oregon)
    • 52.19.239.192/27 for EU (Ireland)
  • Double check the redshift user/password you gave to Firehose
  • At this point, you should be able to see the connection attempts in Redshift logs :

    select * from stl_connection_log where remotehost like '52%' order by recordtime desc;  
    
  • Check that the Redshift user used by Firehose has enough privileges on the target table :

    select tablename, 
       HAS_TABLE_PRIVILEGE(tablename, 'select') as select,
       HAS_TABLE_PRIVILEGE(tablename, 'insert') as insert,
       HAS_TABLE_PRIVILEGE(tablename, 'update') as update,
       HAS_TABLE_PRIVILEGE(tablename, 'delete') as delete, 
       HAS_TABLE_PRIVILEGE(tablename, 'references') as references 
    from pg_tables where schemaname='public' order by tablename;
    
  • Then you can check if the COPY command is run :

    select * from stl_query order by endtime desc limit 10;
    
  • Then check load errors, or server errors :

    select * from stl_load_errors  order by starttime desc;
    select * from stl_error where userid!=0 order by recordtime desc;
    
  • If you have format problems in your data, or in the COPY options, or a mismatch between your data and the target columns, you should at least see the COPY attempts, and some load errors.

  • If you're still stuck, with nothing appearing in those log tables, try deleting and recreating the whole firehose stream, as there may be some bugs related to the web console. (This step worked for me)

like image 92
mathieu Avatar answered Sep 19 '22 05:09

mathieu


Go to the IAM role(firehose_delivery_role) auto-created during the Kinesis Firehose setup and make sure that the following roles are attached:

AmazonS3FullAccess
AmazonRedshiftFullAccess
AmazonKinesisFullAccess
AmazonKinesisFirehoseFullAccess

There is a bug that omits the S3 credentials in IAM, leaving the Kinesis setup unable to work.

Also verify that you in fact see the data files accumulating in S3.

like image 30
user5665464 Avatar answered Sep 20 '22 05:09

user5665464