Anyone experienced data lost when using AWS kinesis streams, lambda and firehose?

Tags:

I'm currently sending a series of xml messages to aws kinesis stream, I've been using this on different projects, so I'm pretty confident that this bit works. Then I've written a lambda to process events from kinesis stream to kinesis firehose:

import os
import boto3
import base64

firehose = boto3.client('firehose')


def lambda_handler(event, context):
    deliveryStreamName = os.environ['FIREHOSE_STREAM_NAME']

    # Send record directly to firehose
    for record in event['Records']:
        data = record['kinesis']['data']

        response = firehose.put_record(
            DeliveryStreamName=deliveryStreamName,
            Record={'Data': data}
        )
        print(response)

I've set the kinesis stream as the lamdba trigger, and set the batch size as 1, and starting position LATEST.

For the kinesis firehose I have the following config:

Data transformation*: Disabled
Source record backup*: Disabled
S3 buffer size (MB)*: 10
S3 buffer interval (sec)*: 60
S3 Compression: UNCOMPRESSED
S3 Encryption: No Encryption
Status: ACTIVE
Error logging: Enabled

I sent 162 events, and I read them from s3, and the most I've managed to get it 160, and usually it's less. I've even tried to wait a few hours incase something strange was happening with retries.

Anyone had any experience using kinesis-> lamdba -> firehose, and seen issues of lost data?

783

asked Jul 04 '17 15:07

lorilew

1 Answers

From what I see here, most likely items are lost when you are publishing data to the Kinesis Stream (not FireHose).

Since you are using put_record when writing to FireHose, it will throw an exception and the lambda will be retried in that case. (Make sense to check if there are failures on that level).

So considering that I may suppose that records are lost before they reach Kinesis stream. If you are sending items to Kinesis stream using put_records method, that doesn't guarantee that all the records will be sent to the stream(due to exceeded write throughput or internal errors), some of the records may fail to be sent. In that case failed subset of records should be resend by your code (Here is Java example, sorry I wasn't able to find the Python one).

130

answered Nov 15 '22 04:11

k0lpak

Related questions
                            
                                Pre-caching dynamically generated images for multiple Edge locations on Amazon Cloudfront
                            
                                Quick way to clone a production RDS instance
                            
                                404 err on Rails 3 deploy to AWS... fine except EB doesn't copy /ondeck to /current
                            
                                Google Docs inline pdf shows up as black and white
                            
                                VACUUM on Redshift (AWS) after DELETE and INSERT
                            
                                EMR activity stuck in Waiting_For_Runner state
                            
                                How to set the precise max number of concurrently running tasks per node in Hadoop 2.4.0 on Elastic MapReduce
                            
                                Add synonyms custom dictionary to Amazon RDS for PostgreSQL
                            
                                Conflict with android package - Amazon SNS
                            
                                SNS Batch Publish
                            
                                Database trouble deploying django app to amazon beanstalk
                            
                                com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: XXXXXXXX)
                            
                                AWS access keys (for CLI authentication, etc..) for users from a SAML identity provider, or AD connector?
                            
                                AmazonServiceException class not found
                            
                                Slow PostgreSQL sequential scans on RDS?
                            
                                AWS: How to properly authenticate a user against Cognito Pool and use it for Cognito Federated Identity?
                            
                                Install custom plugin for Kibana on AWS ElasticSearch Instance
                            
                                Mutual Authentication (2-way SSL) in AWS Lambda
                            
                                Scheduling reports & data driven alerts in Amazon Quicksight [closed]
                            
                                Cannot find ODBC driver in AWS Lambda when using unixODBC

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Anyone experienced data lost when using AWS kinesis streams, lambda and firehose?

Tags:

amazon-web-services

amazon-kinesis

amazon-kinesis-firehose

lorilew

People also ask

1 Answers

k0lpak

Recent Activity

Donate For Us