Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Streaming: Application health

I have a Kafka based Spark Streaming application that runs every 5 mins. Looking at the statistics after 5 days of run, there are a few observations:

  1. The Processing time gradually increases from 30 secs to 50 secs. The snapshot is shown below which highlights the processing time chart: Snapshot

  2. A good number of Garbage collection logs are appearing as shown below: Snapshot

Questions:

  1. Is there a good explanation why the Processing Time has increased substantially, even when number of events are more or less same (during the last trough) ?
  2. I am getting almost 70 GC logs at the end of each processing cycle. It is normal?
  3. Is the a better strategy to ensure the processing time to remain with in acceptable delays?
like image 980
Mohitt Avatar asked Oct 31 '22 08:10

Mohitt


1 Answers

It really depends on the application. The way I'd approach when debugging this issue is the following:

  1. Under Storage tab see whether the stored sizes are not growing. If there's a growth this can indicate some kind of cached resources leak. Check what's the value of spark.cleaner.ttl, but better make sure you uncache all the resources when they are not needed anymore.
  2. Inspect DAG visualization of running jobs, and see whether the lineage is not growing. If this is the case, make sure to perform checkpointing to cut the lineage.
  3. Reduce the number of retained batches in UI (spark.streaming.ui.retainedBatches parameter).
  4. Even the number of events is the same, please see whether the amount of data processed by tasks doesn't grow with time (Stages tab -> Input column). This could point to an application level issue.

I've had relatively complex Spark Streaming applications (Spark v1.6, v2.1.1, v2.2.0) running for days without any degradation in performance, so there must be some solvable issue.

like image 57
Michael Spector Avatar answered Nov 15 '22 09:11

Michael Spector