I am running my Spark application on EMR, and have several println() statements. Other than the console, where do these statements get logged?
My S3 aws-logs directory structure for my cluster looks like:
node
├── i-0031cd7a536a42g1e
│ ├── applications
│ ├── bootstrap-actions
│ ├── daemons
│ ├── provision-node
│ └── setup-devices
containers/
├── application_12341331455631_0001
│ ├── container_12341331455631_0001_01_000001
Amazon EMR and Hadoop both produce log files that report status on the cluster. By default, these are written to the master node in the /mnt/var/log/ directory.
Normally, there is a spark-defaults. conf file located in /etc/spark/conf after I create a spark cluster on EMR.
EMRFS provides the convenience of storing persistent data in Amazon S3 for use with Hadoop while also providing features like Amazon S3 server-side encryption, read-after-write consistency, and list consistency.
How do I access Spark driver logs on an Amazon EMR cluster? On Amazon EMR, Spark runs as a YARN application and supports two deployment modes: Client mode: This is the default deployment mode. In client mode, the Spark driver runs on the host where the spark-submit command is run.
When you submit a Spark application by running spark-submit with --deploy-mode client on the master node, the driver logs are displayed in the terminal window. Amazon EMR doesn't archive these logs by default. To capture the logs, save the output of the spark-submit command to a file. Example:
Found logs in the following location: EMR UI Console -> Summary -> Log URI -> Containers -> application_xxx_xxx -> container_yyy_yy_yy -> stdout.gz. Show activity on this post. If you submit your job with emr-bootstrap you can specify the log directory as an s3 bucket with --log-uri
When running a Spark job using EMR Studio, there are a few steps you can take to help ensure that you're optimizing your Amazon EMR cluster resources. If you use Apache Livy along with Spark on your Amazon EMR cluster, we recommend that you increase your Livy session timeout by doing one of the following:
You can find println's in a few places:
containers/application_.../container_.../stdout
(though this takes a few minutes to populate after the application)yarn logs -applicationId <Application ID> -log_files <log_file_type>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With