Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where does EMR store Spark stdout?

I am running my Spark application on EMR, and have several println() statements. Other than the console, where do these statements get logged?

My S3 aws-logs directory structure for my cluster looks like:

node ├── i-0031cd7a536a42g1e │   ├── applications │   ├── bootstrap-actions │   ├── daemons │   ├── provision-node │   └── setup-devices containers/ ├── application_12341331455631_0001 │   ├── container_12341331455631_0001_01_000001

like image 465
B. Smith Avatar asked Dec 07 '17 23:12

B. Smith


People also ask

Where are EMR logs stored?

Amazon EMR and Hadoop both produce log files that report status on the cluster. By default, these are written to the master node in the /mnt/var/log/ directory.

Where is Spark config file EMR?

Normally, there is a spark-defaults. conf file located in /etc/spark/conf after I create a spark cluster on EMR.

Does AWS EMR store data?

EMRFS provides the convenience of storing persistent data in Amazon S3 for use with Hadoop while also providing features like Amazon S3 server-side encryption, read-after-write consistency, and list consistency.

How do I access spark driver logs on an EMR cluster?

How do I access Spark driver logs on an Amazon EMR cluster? On Amazon EMR, Spark runs as a YARN application and supports two deployment modes: Client mode: This is the default deployment mode. In client mode, the Spark driver runs on the host where the spark-submit command is run.

How do I get spark-submit logs from Amazon EMR?

When you submit a Spark application by running spark-submit with --deploy-mode client on the master node, the driver logs are displayed in the terminal window. Amazon EMR doesn't archive these logs by default. To capture the logs, save the output of the spark-submit command to a file. Example:

Where can I find the EMR container logs?

Found logs in the following location: EMR UI Console -> Summary -> Log URI -> Containers -> application_xxx_xxx -> container_yyy_yy_yy -> stdout.gz. Show activity on this post. If you submit your job with emr-bootstrap you can specify the log directory as an s3 bucket with --log-uri

How can I Optimize my Amazon EMR cluster resources when running spark?

When running a Spark job using EMR Studio, there are a few steps you can take to help ensure that you're optimizing your Amazon EMR cluster resources. If you use Apache Livy along with Spark on your Amazon EMR cluster, we recommend that you increase your Livy session timeout by doing one of the following:


1 Answers

You can find println's in a few places:

  • Resource Manager -> Your Application -> Logs -> stdout
  • Your S3 log directory -> containers/application_.../container_.../stdout (though this takes a few minutes to populate after the application)
  • SSH into the EMR, yarn logs -applicationId <Application ID> -log_files <log_file_type>
like image 103
ayplam Avatar answered Sep 26 '22 14:09

ayplam