Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spark-submit EMR Step failing when submitted using boto3 client

I'm trying to execute spark-submit using boto3 client for EMR. After executing the code below, EMR step submitted and after few seconds failed. The actual command line from step logs is working if executed manually on EMR master.

Controller log shows hardly readable garbage, looking like several processes writing there concurrently.

UPD: Tried command-runner.jar and EMR versions 4.0.0 and 4.1.0

Any idea appreciated.

The code fragment:

class ProblemExample:
    def run(self):
        session = boto3.Session(profile_name='emr-profile')
        client = session.client('emr')
        response = client.add_job_flow_steps(
        JobFlowId=cluster_id,
        Steps=[
            {
                'Name': 'string',
                'ActionOnFailure': 'CONTINUE',
                'HadoopJarStep': {
                    'Jar': 's3n://elasticmapreduce/libs/script-runner/script-runner.jar',
                    'Args': [
                        '/usr/bin/spark-submit',
                        '--verbose',
                        '--class',
                        'my.spark.job',
                        '--jars', '<dependencies>',
                        '<my spark job>.jar'
                    ]
                }
            },
        ]
    )
like image 442
Robert Navado Avatar asked Oct 23 '15 16:10

Robert Navado


1 Answers

Finally the problem resolved by escaping --jars values properly.

spark-submit was failing not finding classes, but on the background of messy logs the error is not clear.

The valid example is:

class Example:
  def run(self):
    session = boto3.Session(profile_name='emr-profile')
    client = session.client('emr')
    response = client.add_job_flow_steps(
    JobFlowId=cluster_id,
    Steps=[
        {
            'Name': 'string',
            'ActionOnFailure': 'CONTINUE',
            'HadoopJarStep': {
                'Jar': 'command-runner.jar',
                'Args': [
                    '/usr/bin/spark-submit',
                    '--verbose',
                    '--class',
                    'my.spark.job',
                    '--jars', '\'<coma, separated, dependencies>\'',
                    '<my spark job>.jar'
                ]
            }
        },
    ]
)
like image 81
Robert Navado Avatar answered Sep 27 '22 23:09

Robert Navado