Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Disable CloudWatch for AWS Kinesis at Spark Streaming

I would like to know If it's possible?

here is the code: numStreams I get it by using AmazonKinesisClient API

 // Create the Kinesis DStreams
    List<JavaDStream<byte[]>> streamsList = new ArrayList<>(numStreams);
    for (int i = 0; i < numStreams; i++) {
      streamsList.add(
              KinesisUtils.createStream(jssc, kinesisAppName, streamName, endpointUrl, regionName,
              InitialPositionInStream.TRIM_HORIZON, kinesisCheckpointInterval,
              StorageLevel.MEMORY_AND_DISK_2(),accessesKey,secretKey)
      );
    }

I tried looking through the API and I just couldn't find any reference to disabling Apache Streaming CloudWatch.

here is the Warnings that i try getting rid of:

17/01/23 17:46:29 WARN CWPublisherRunnable: Could not publish 16 datums to CloudWatch com.amazonaws.AmazonServiceException: User: arn:aws:iam:::user/Kinesis_Service is not authorized to perform: cloudwatch:PutMetricData (Service: AmazonCloudWatch; Status Code: 403; Error Code: AccessDenied; Request ID: *****) at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1377) at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:923) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:701) at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:453) at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:415) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:364) at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.doInvoke(AmazonCloudWatchClient.java:984) at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.invoke(AmazonCloudWatchClient.java:954) at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.putMetricData(AmazonCloudWatchClient.java:853) at com.amazonaws.services.kinesis.metrics.impl.DefaultCWMetricsPublisher.publishMetrics(DefaultCWMetricsPublisher.java:63) at com.amazonaws.services.kinesis.metrics.impl.CWPublisherRunnable.runOnce(CWPublisherRunnable.java:144) at com.amazonaws.services.kinesis.metrics.impl.CWPublisherRunnable.run(CWPublisherRunnable.java:90) at java.lang.Thread.run(Unknown Source)

like image 247
Tal.Bary Avatar asked Oct 30 '22 13:10

Tal.Bary


1 Answers

Preface : I know this is kind of old question, but just faced this so posting a solution for anyone who encounter this issue with Spark <= 2.3.3

It is possible to disable Cloudwatch metrics reporting at KCL (Kinesis Client) library level with withMetrics methods when building the client.

Unfortunately, Spark KinesisInputDStream method does not expose a way to change this setting and to make things worse, the default level is "DETAILED" which send 10s of metric every 10 seconds.

The way I took in order to disable it is to provide invalid credential to the method cloudWatchCredentials from KinesisInputDStream. IE : .cloudWatchCredentials(SparkAWSCredentials.builder.basicCredentials("DISABLED", "DISABLED").build())

Then comes the issue for CloudWatchAsyncClient logging warning at each tick, which I disabled by setting the following in spark log4j.properties config :

# Set Kinesis logging metrics to Warn - Since we intentionally provide
# wrong credentials in order to disable cloudwatch logging. Bad credential
# warning are logged at WARN level - so we still get errors.
log4j.logger.com.amazonaws.services.kinesis.metrics=ERROR

This will suppress warning for the metrics package class only (such as the one you mentioned) but will not suppress the error, in case those are needed.

This is nowhere close to an ideal solution, but this allowed us deploying a solution while existing Spark version deployed.

Next steps : open a ticket to Spark so they can hopefully allow us to disable it for the next versions.

Edit - created: https://issues.apache.org/jira/browse/SPARK-27420 for tracking

like image 165
jgagnon1 Avatar answered Nov 15 '22 08:11

jgagnon1