I am setting up a Spark Streaming project with Kinesis and when I try to connect to my Kinesis stream I am getting the following error from Spark:
ERROR ShardSyncTask: Caught exception while sync'ing Kinesis shards and leases
com.amazonaws.services.kinesis.clientlibrary.exceptions.internal.KinesisClientLibIOException: Parent shard shardId-000000000000 exists but not the child shard shardId-000000000002
When I post test data to this stream or read data from the stream using the base Amazon libraries I get no errors, this only occurs when I try to connect with Spark.
Below is the code that I am using for my tests:
val conf = new SparkConf().setMaster("local[2]").setAppName("KinesisCounter")
val ssc = new StreamingContext(conf, Seconds(1))
val rawStream = KinesisUtils.createStream(ssc, "dev-test", "kinesis.us-east-1.amazonaws.com", Duration(1000), InitialPositionInStream.TRIM_HORIZON, StorageLevel.MEMORY_ONLY)
rawStream.map(msg => new String(msg)).count.print
How many shards you have on Kinesis?
what I would do is:
Changing the application name or stream name can lead to Kinesis errors in some cases. If you see errors, you may need to manually delete the DynamoDB table
Hope it helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With