I'd like to collect all the executor logs in the Spark application driver programmatically. (When something failed I want to collect and store all the relevant logs.) Is there a nice way to do this?
One idea is to create an empty RDD with one partition per executor. Then I somehow ensure that each partition is actually processed on a different executor (no idea how) and do a mapPartitions
in which I load the executor log from disk, and then a collect
to fetch them to the application.
Perhaps there is a better way, but we use a script to sync executor logs to S3 every 5 seconds
#!/bin/sh
# This scripts syncs executor log files to S3.
while [[ $# > 1 ]]; do
key="$1"
case $key in
-l|--log-uri)
LOG_BUCKET="$2"
shift
;;
*)
echo "Unknown option: ${key}"
exit 1;
esac
shift
done
set -u
JOB_FLOW_ID=$(cat /mnt/var/lib/info/job-flow.json | grep jobFlowId | sed -e 's,.*"\(j-.*\)".*,\1,g')
# Start background process that syncs every 5 seconds.
while true; do aws s3 sync /home/hadoop/spark/work ${LOG_BUCKET}/${JOB_FLOW_ID}/executors/`hostname`/; sleep 5; done &
We launch the script (which is stored on S3 in a file named sync-executor-logs.sh) in a bootstrap action
--bootstrap-actions Path=s3://path/to/my/script/sync-executor-logs.sh,Name=Sync-executor-logs,Args=[-l,s3://path/to/logfiles]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With