Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mrjob: Invalid bootstrap action path, must be a location in Amazon S3

I am on windows 7. I installed mrjob and when I run the example word_count file from the website, it works fine on the local machine. However, I get the error when attempting to run it on Amazon EMR. I even tested connecting to amazon s3 with just boto and it works.

mrjob.conf file

runners:
  emr:
    aws_access_key_id: xxxxxxxxxxxxx
    aws_region: us-east-1
    aws_secret_access_key: xxxxxxxx
    ec2_key_pair: bzy
    ec2_key_pair_file: C:\aa.pem
    ec2_instance_type: m1.small
    num_ec2_instances: 3
    s3_log_uri: s3://myunique/
    s3_scratch_uri: s3://myunique/

running the following in my cmd

python word_count.py -c mrjob.conf -r emr mytext.txt

it produces

enter image description here

Upon suggestions that it was a windows path related issue, I double checked the parse.py in the source code, and it seems to have the relevant check for dealing with window file types

# Used to check if the candidate candidate uri is actually a local windows path.
WINPATH_RE = re.compile(r"^[aA-zZ]:\\")


def is_windows_path(uri):
    """Return True if *uri* is a windows path."""
    if WINPATH_RE.match(uri):
        return True
    else:
        return False


def is_uri(uri):
    """Return True if *uri* is any sort of URI."""
    if is_windows_path(uri):
        return False

    return bool(urlparse(uri).scheme)

What I don't understand is that I am still getting the error even after the updated code, and I'm not sure how to move forward with this.

like image 501
KJW Avatar asked Apr 22 '14 07:04

KJW


2 Answers

The problems you are experiencing is due to the windows file system using the escape character \ (backslash) in its path. Just double it up and you should not have any more problems.

Change your mrjob.conf file to:

runners:
  emr:
    aws_access_key_id: xxxxxxxxxxxxx
    aws_region: us-east-1
    aws_secret_access_key: xxxxxxxx
    ec2_key_pair: bzy
    ec2_key_pair_file: C:\\aa.pem
    ec2_instance_type: m1.small
    num_ec2_instances: 3
    s3_log_uri: s3://myunique/
    s3_scratch_uri: s3://myunique/

for more information go visit: http://yaml.org/spec/1.2/spec.html#id2770814

like image 146
cchristelis Avatar answered Nov 20 '22 16:11

cchristelis


I was having a similar problem, and found that my issue was that I had included code from various files with file paths inside of my job. If that is the case, the error noted will also occur.

like image 1
David Manheim Avatar answered Nov 20 '22 15:11

David Manheim