I am trying to make Snowplow work on AWS. When I am trying to run stream-enrich service on instance, I am getting this exception:
[main] INFO com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker - Syncing Kinesis shard info
[main] ERROR com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncTask - Caught exception while sync'ing Kinesis shards and leases
[cw-metrics-publisher] WARN com.amazonaws.services.kinesis.metrics.impl.CWPublisherRunnable - Could not publish 4 datums to CloudWatch
I don't think error is due to Cloud Watch:
Caught exception while sync'ing Kinesis shards and leases
A shard has a sequence of data records in a stream. It serves as a base throughput unit of a Kinesis data stream. A shard supports 1 MB/second and 1,000 records per second for writes and 2 MB/second for reads.
A Kinesis data stream is a set of shards. There can be multiple consumer applications for one stream, and each application can consume data independently and concurrently.
Messaging semantics: Kinesis always uses “at least once” message delivery, whereas Kafka supports both “at least once” and “exactly once” message delivery. Message size: A single message in Kinesis can be up to 1MB.
So, in order words, when your stream is processing there is a row for every shard in a corresponding dynamoDB table. These rows contain information relating to the current state of processing of that shard... and this is known as lease information.
As mentioned in the comments above, this error will crop when you're lacking permissions to AWS resources required by Kinesis Client Library (KCL). This can be the DynamoDB, CloudWatch, or Kinesis. For the Stream Enrich component of Snowplow, you'll need the following permissions:
application.conf
)A templated version of an IAM policy that meets these needs is as follows:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"kinesis:DescribeStream",
"kinesis:GetShardIterator",
"kinesis:GetRecords",
"kinesis:ListShards"
],
"Resource": [
"${collector_stream_out_good}"
]
},
{
"Effect": "Allow",
"Action": [
"kinesis:ListStreams"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"kinesis:DescribeStream",
"kinesis:PutRecord",
"kinesis:PutRecords"
],
"Resource": [
"${enricher_stream_out_good}",
"${enricher_stream_out_bad}"
]
},
{
"Effect": "Allow",
"Action": [
"dynamodb:CreateTable",
"dynamodb:DescribeTable",
"dynamodb:Scan",
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:UpdateItem",
"dynamodb:DeleteItem"
],
"Resource": [
"${enricher_state_table}"
]
},
{
"Effect": "Allow",
"Action": [
"cloudwatch:PutMetricData"
],
"Resource": "*"
}
]
}
I've written up a blog post that covers required IAM permissions for Stream Enrich and other Snowplow components since documentation on the exact required permissions was sparse/non-existent in the Snowplow documentation.
Hope that helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With