I would like to know If it's possible?
here is the code: numStreams I get it by using AmazonKinesisClient API
// Create the Kinesis DStreams
List<JavaDStream<byte[]>> streamsList = new ArrayList<>(numStreams);
for (int i = 0; i < numStreams; i++) {
streamsList.add(
KinesisUtils.createStream(jssc, kinesisAppName, streamName, endpointUrl, regionName,
InitialPositionInStream.TRIM_HORIZON, kinesisCheckpointInterval,
StorageLevel.MEMORY_AND_DISK_2(),accessesKey,secretKey)
);
}
I tried looking through the API and I just couldn't find any reference to disabling Apache Streaming CloudWatch.
here is the Warnings that i try getting rid of:
17/01/23 17:46:29 WARN CWPublisherRunnable: Could not publish 16 datums to CloudWatch com.amazonaws.AmazonServiceException: User: arn:aws:iam:::user/Kinesis_Service is not authorized to perform: cloudwatch:PutMetricData (Service: AmazonCloudWatch; Status Code: 403; Error Code: AccessDenied; Request ID: *****) at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1377) at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:923) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:701) at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:453) at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:415) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:364) at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.doInvoke(AmazonCloudWatchClient.java:984) at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.invoke(AmazonCloudWatchClient.java:954) at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.putMetricData(AmazonCloudWatchClient.java:853) at com.amazonaws.services.kinesis.metrics.impl.DefaultCWMetricsPublisher.publishMetrics(DefaultCWMetricsPublisher.java:63) at com.amazonaws.services.kinesis.metrics.impl.CWPublisherRunnable.runOnce(CWPublisherRunnable.java:144) at com.amazonaws.services.kinesis.metrics.impl.CWPublisherRunnable.run(CWPublisherRunnable.java:90) at java.lang.Thread.run(Unknown Source)
Preface : I know this is kind of old question, but just faced this so posting a solution for anyone who encounter this issue with Spark <= 2.3.3
It is possible to disable Cloudwatch metrics reporting at KCL (Kinesis Client) library level with withMetrics
methods when building the client.
Unfortunately, Spark KinesisInputDStream
method does not expose a way to change this setting and to make things worse, the default level is "DETAILED" which send 10s of metric every 10 seconds.
The way I took in order to disable it is to provide invalid credential to the method cloudWatchCredentials
from KinesisInputDStream. IE : .cloudWatchCredentials(SparkAWSCredentials.builder.basicCredentials("DISABLED", "DISABLED").build())
Then comes the issue for CloudWatchAsyncClient logging warning at each tick, which I disabled by setting the following in spark log4j.properties config :
# Set Kinesis logging metrics to Warn - Since we intentionally provide
# wrong credentials in order to disable cloudwatch logging. Bad credential
# warning are logged at WARN level - so we still get errors.
log4j.logger.com.amazonaws.services.kinesis.metrics=ERROR
This will suppress warning for the metrics package class only (such as the one you mentioned) but will not suppress the error, in case those are needed.
This is nowhere close to an ideal solution, but this allowed us deploying a solution while existing Spark version deployed.
Next steps : open a ticket to Spark so they can hopefully allow us to disable it for the next versions.
Edit - created: https://issues.apache.org/jira/browse/SPARK-27420 for tracking
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With