Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I package or install an entire program to run in an AWS Lambda function

If this is a case of using Lambda entirely the wrong way, please let me know.

I want to install Scrapy into a Lambda function and invoke the function to begin a crawl. My first problem is how to install it, so that all of the paths are correct. I installed the program using the directory to be zipped as its root, so the zip contains all of the source files and the executable. I am basing my efforts on this article. In the line it says to include at the beginning of my function, where does the "process" variable come from? I have tried,

var process = require('child_process');
var exec = process.exec;
process.env['PATH'] = process.env['PATH'] + ':' + 
process.env['LAMBDA_TASK_ROOT']

but I get the error,

"errorMessage": "Cannot read property 'PATH' of undefined",
"errorType": "TypeError",

Do I need to include all of the library files, or just the executable from /usr/lib ? How do I include that one line of code the article says I need?

Edit: I tried moving the code into a child_process.exec, and received the error

"errorMessage": "Command failed: /bin/sh: process.env[PATH]: command not found\n/bin/sh: scrapy: command not found\n"

Here is my current, entire function

console.log("STARTING");
var process = require('child_process');
var exec = process.exec;

exports.handler = function(event, context) {    
    //Run a fixed Python command.
    exec("process.env['PATH'] = process.env['PATH'] + ':' + process.env['LAMBDA_TASK_ROOT']; scrapy crawl backpage2", function(error, stdout) {
        console.log('Scrapy returned: ' + stdout + '.');
        context.done(error, stdout);
    });

};
like image 724
michaelAdam Avatar asked Jul 31 '15 14:07

michaelAdam


People also ask

Can we use packages with AWS Lambda?

Your AWS Lambda function's code consists of scripts or compiled programs and their dependencies. You use a deployment package to deploy your function code to Lambda. Lambda supports two types of deployment packages: container images and . zip file archives.

How big a package can you load into AWS Lambda?

Deployment Package Limits There is a hard limit of 50MB for compressed deployment package with AWS Lambda and an uncompressed AWS Lambda hard limit of 250MB.


3 Answers

You can definitely execute arbitrary processes from your Node code in Lambda. We're doing this to run a game server that processes player turns, for example. I'm not sure exactly what you're trying to accomplish with Scrapy, but remember that your entire Lambda invocation can only live for a maximum of 60 seconds right now on AWS! But if that's OK for you, here is a completely working example of how we are executing our own arbitrary linux process from Lambda. (In our case, it's a compiled binary - really doesn't matter as long as you have something that can run on the Linux image they use).

var child_process = require('child_process');
var path = require('path');

exports.handler = function (event, context) {
    // If timeout is provided in context, get it. Otherwise, assume 60 seconds
    var timeout = (context.getRemainingTimeInMillis && (context.getRemainingTimeInMillis() - 1000)) || 60000;
    // The task root is the directory with the code package.
    var taskRoot = process.env['LAMBDA_TASK_ROOT'] || __dirname;
    // The command to execute.
    var command;

    // Set up environment variables
    process.env.HOME = '/tmp'; // <-- for naive processes that assume $HOME always works! You might not need this.

    // On linux the executable is in task root / __dirname, whichever was defined
    process.env.PATH += ':' + taskRoot;
    command = 'bash -c "cp -R /var/task/YOUR_THING /tmp/; cd /tmp; ./YOUR_THING ARG1 ARG2 ETC"'

    child_process.exec(command, {
        timeout: timeout,
        env: process.env
    }, function (error, stdout, stderr) {
        console.log(stdout);
        console.log(stderr);
        context.done(null, {exit: true, stdout: stdout, stderr: stderr});
    });
};
like image 60
Clay Fowler Avatar answered Nov 15 '22 08:11

Clay Fowler


Problem with your example is modifying node global variable process by

var process = require('child_process');

This way you can't alter PATH environment variable, and thus the reason why you are getting Cannot read property 'PATH' of undefined.

Just use different name for loaded child_process library, e.g.

//this gets updated to child_process
var child_process = require('child_process');
var exec = child_process.exec;

//global process variable is still accessible
process.env['PATH'] = process.env['PATH'] + ':' + 
process.env['LAMBDA_TASK_ROOT']

//start new process with your binary
exec('path/to/your/binary',......
like image 45
toske Avatar answered Nov 15 '22 08:11

toske


Here is a Lambda function that runs a python script setting the current working directory to the same directory as the Lambda function. You may be able to use this with some modifications to the relative location of your python script.

var child_process = require("child_process");

exports.handler = function(event, context) {
    var execOptions = {
        cwd: __dirname
    };
    child_process.exec("python hello.py", execOptions, function (error, stdout, stderr) {
        if (error) {
            context.fail(error);
        } else {
            console.log("stdout:\n", stdout);
            console.log("stderr:\n", stderr);
            context.succeed();
        }
    });
};
like image 37
James Avatar answered Nov 15 '22 06:11

James