Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Processing DynamoDB streams using the AWS Java DynamoDB streams Kinesis adapter

I'm attempting to capture DynamoDB table changes using DynamoDB streams and the AWS provided Java DynamoDB streams Kinesis adapter. I'm working with the AWS Java SDKs in a Scala app.

I started by following the AWS guide and by going through the AWS published code example. However I'm having issues getting Amazon's own published code working in my environment. My issue lies with the KinesisClientLibConfiguration object.

In the example code, KinesisClientLibConfiguration is configured with the stream ARN provided by DynamoDB.

new KinesisClientLibConfiguration("streams-adapter-demo",
    streamArn, 
    streamsCredentials, 
    "streams-demo-worker")

I followed a similar pattern in my Scala app by first locating the current ARN from my Dynamo table:

lazy val streamArn = dynamoClient.describeTable(config.tableName)
.getTable.getLatestStreamArn

And then creating the KinesisClientLibConfiguration with the provided ARN:

lazy val kinesisConfig :KinesisClientLibConfiguration =
new KinesisClientLibConfiguration(
  "testProcess",
  streamArn,
  defaultProviderChain,
  "testWorker"
).withMaxRecords(1000)
   .withRegionName("eu-west-1")
   .withMetricsLevel(MetricsLevel.NONE)
  .withIdleTimeBetweenReadsInMillis(500)
  .withInitialPositionInStream(InitialPositionInStream.TRIM_HORIZON)

I've verified the provided stream ARN and everything matches what I see in the AWS console.

At runtime I end up getting an exception stating that the provided ARN is not a valid stream name:

com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncTask call
SEVERE: Caught exception while sync'ing Kinesis shards and leases
com.amazonaws.services.kinesis.model.AmazonKinesisException: 1 validation     
error detected: Value 'arn:aws:dynamodb:eu-west-1:STREAM ARN' at 
'streamName'    failed to satisfy constraint: Member must satisfy regular 
expression pattern: [a-zA-Z0-9_.-]+ (Service: AmazonKinesis; Status Code: 
400; Error Code: ValidationException; Request ID: )

Looking at the documentation provided on KinesisClientLibConfiguration this does make sense as the second parameter is listed as the streamName without any mention of an ARN.

I can't seem to find anything on KinesisClientLibConfiguration that is related to an ARN. Since I'm working with a DynamoDB stream and not a Kinesis stream I'm also unsure how to find my stream name.

At this point I'm unsure what I'm missing from the published AWS example, it seems like they may be using a much older version of the KCL. I'm using version 1.7.0 of amazon-kinesis-client.

like image 331
francis Avatar asked Oct 15 '16 15:10

francis


1 Answers

The issue actually ended up being outside of my KinesisClientLibConfiguration.

I was able to get around this issue by using the same configuration and by providing both the stream adapter included with the DynamoDB stream adapter library and clients for both DynamoDB and CloudWatch.

My working solution now looks like this.

Defining the Kinesis client config.

//Kinesis config for DynamoDB streams
lazy val kinesisConfig :KinesisClientLibConfiguration =
    new KinesisClientLibConfiguration(
        getClass.getName, //DynamoDB shard lease table name
        streamArn, //pulled from the dynamo table at runtime
        dynamoCredentials, //DefaultAWSCredentialsProviderChain 
        KeywordTrackingActor.NAME //Lease owner name
    ).withMaxRecords(1000) //using AWS recommended value
     .withIdleTimeBetweenReadsInMillis(500) //using AWS recommended value
    .withInitialPositionInStream(InitialPositionInStream.TRIM_HORIZON)

Define a stream adapter and a CloudWatch client

val streamAdapterClient :AmazonDynamoDBStreamsAdapterClient = new AmazonDynamoDBStreamsAdapterClient(dynamoCredentials)
streamAdapterClient.setRegion(region)

val cloudWatchClient :AmazonCloudWatchClient = new AmazonCloudWatchClient(dynamoCredentials)
cloudWatchClient.setRegion(region)

Create an instance of a RecordProcessorFactory, it's up to you to define a class that implements the KCL provided IRecordProcessorFactory and the returned IRecordProcessor.

val recordProcessorFactory :RecordProcessorFactory = new RecordProcessorFactory(context, keywordActor, config.keywordColumnName)

And the part I was missing, all of this needs to be provided to your worker.

val worker :Worker =
  new Worker.Builder()
    .recordProcessorFactory(recordProcessorFactory)
    .config(kinesisConfig)
    .kinesisClient(streamAdapterClient)
    .dynamoDBClient(dynamoClient)
    .cloudWatchClient(cloudWatchClient)
    .build()

//this will start record processing
streamExecutorService.submit(worker)
like image 151
francis Avatar answered Oct 12 '22 13:10

francis