Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon Kinesis: Caught exception while sync'ing Kinesis shards and leases

I am trying to make Snowplow work on AWS. When I am trying to run stream-enrich service on instance, I am getting this exception:

[main] INFO com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker - Syncing Kinesis shard info
[main] ERROR com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncTask - Caught exception while sync'ing Kinesis shards and leases
[cw-metrics-publisher] WARN com.amazonaws.services.kinesis.metrics.impl.CWPublisherRunnable - Could not publish 4 datums to CloudWatch

I don't think error is due to Cloud Watch:

Caught exception while sync'ing Kinesis shards and leases

like image 500
Prakhar Mishra Avatar asked Jan 18 '18 13:01

Prakhar Mishra


People also ask

Is Shard a component of Kinesis?

A shard has a sequence of data records in a stream. It serves as a base throughput unit of a Kinesis data stream. A shard supports 1 MB/second and 1,000 records per second for writes and 2 MB/second for reads.

Can Kinesis have multiple consumers?

A Kinesis data stream is a set of shards. There can be multiple consumer applications for one stream, and each application can consume data independently and concurrently.

Is Kinesis exactly once?

Messaging semantics: Kinesis always uses “at least once” message delivery, whereas Kafka supports both “at least once” and “exactly once” message delivery. Message size: A single message in Kinesis can be up to 1MB.

What is Kinesis lease?

So, in order words, when your stream is processing there is a row for every shard in a corresponding dynamoDB table. These rows contain information relating to the current state of processing of that shard... and this is known as lease information.


1 Answers

As mentioned in the comments above, this error will crop when you're lacking permissions to AWS resources required by Kinesis Client Library (KCL). This can be the DynamoDB, CloudWatch, or Kinesis. For the Stream Enrich component of Snowplow, you'll need the following permissions:

  • Read permission to input kinesis stream (collector good)
  • Write permission to output kinesis streams (enrich good & enrich bad)
  • List permission to kinesis streams
  • Read/write/create permission to DynamoDB state table (table name is the “appName” value in your stream enrich application.conf)
  • PutMetricData to Cloudwatch

A templated version of an IAM policy that meets these needs is as follows:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "kinesis:DescribeStream",
        "kinesis:GetShardIterator",
        "kinesis:GetRecords",
        "kinesis:ListShards"
      ],
      "Resource": [
        "${collector_stream_out_good}"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
          "kinesis:ListStreams"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "kinesis:DescribeStream",
        "kinesis:PutRecord",
        "kinesis:PutRecords"
      ],
      "Resource": [
        "${enricher_stream_out_good}",
        "${enricher_stream_out_bad}"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:CreateTable",
        "dynamodb:DescribeTable",
        "dynamodb:Scan",
        "dynamodb:GetItem",
        "dynamodb:PutItem",
        "dynamodb:UpdateItem",
        "dynamodb:DeleteItem"
      ],
      "Resource": [
        "${enricher_state_table}"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "cloudwatch:PutMetricData"
      ],
      "Resource": "*"
    }
  ]
}

I've written up a blog post that covers required IAM permissions for Stream Enrich and other Snowplow components since documentation on the exact required permissions was sparse/non-existent in the Snowplow documentation.

Hope that helps!

like image 150
ahawker Avatar answered Sep 20 '22 05:09

ahawker