Amazon EMR, Apache Spark 2.3, Apache Kafka, ~10 mln records per day.
Apache Spark used for processing events in batches by 5 minutes, once per day worker nodes are dying and AWS reprovision automatically the nodes. On reviewing the log messages it looks like no space in the nodes, but they are having about 1Tb storage there.
Did someone has the issues with storage space in cases when it should be more than enough?
I was thinking the log aggregation could not copy properly the logs to s3 bucket, that should be done automatically by spark process as I see.
What kind of the information should I provide to help to resolve this issue?
Thank you in advance!
Apache Spark is a distributed processing framework and programming model that helps you do machine learning, stream processing, or graph analytics using Amazon EMR clusters. Similar to Apache Hadoop, Spark is an open-source, distributed processing system commonly used for big data workloads.
The Real-Time Analytics with Spark Streaming solution is an AWS-provided reference implementation that automatically provisions and configures the AWS services necessary to start processing real-time and batch data in minutes. The solution is designed to work with customers' Spark Streaming applications.
EMR features Amazon EMR runtime for Apache Spark, a performance-optimized runtime environment for Apache Spark that is active by default on Amazon EMR clusters.
Amazon EMR (previously called Amazon Elastic MapReduce) is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark , on AWS to process and analyze vast amounts of data.
I had a similar issue with a Structured Streaming app on EMR, and disk space rapidly increasing to the point of stalling/crashing application.
In my case the fix was to disable the Spark Event log:
spark.eventLog.enabled
to false
http://queirozf.com/entries/spark-streaming-commong-pitfalls-and-tips-for-long-running-streaming-applications#aws-emr-only-event-logs-under-hdfs-var-log-spark-apps-when-using-a-history-server
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With