If this is a case of using Lambda entirely the wrong way, please let me know.
I want to install Scrapy into a Lambda function and invoke the function to begin a crawl. My first problem is how to install it, so that all of the paths are correct. I installed the program using the directory to be zipped as its root, so the zip contains all of the source files and the executable. I am basing my efforts on this article. In the line it says to include at the beginning of my function, where does the "process" variable come from? I have tried,
var process = require('child_process');
var exec = process.exec;
process.env['PATH'] = process.env['PATH'] + ':' +
process.env['LAMBDA_TASK_ROOT']
but I get the error,
"errorMessage": "Cannot read property 'PATH' of undefined",
"errorType": "TypeError",
Do I need to include all of the library files, or just the executable from /usr/lib ? How do I include that one line of code the article says I need?
Edit: I tried moving the code into a child_process.exec, and received the error
"errorMessage": "Command failed: /bin/sh: process.env[PATH]: command not found\n/bin/sh: scrapy: command not found\n"
Here is my current, entire function
console.log("STARTING");
var process = require('child_process');
var exec = process.exec;
exports.handler = function(event, context) {
//Run a fixed Python command.
exec("process.env['PATH'] = process.env['PATH'] + ':' + process.env['LAMBDA_TASK_ROOT']; scrapy crawl backpage2", function(error, stdout) {
console.log('Scrapy returned: ' + stdout + '.');
context.done(error, stdout);
});
};
Your AWS Lambda function's code consists of scripts or compiled programs and their dependencies. You use a deployment package to deploy your function code to Lambda. Lambda supports two types of deployment packages: container images and . zip file archives.
Deployment Package Limits There is a hard limit of 50MB for compressed deployment package with AWS Lambda and an uncompressed AWS Lambda hard limit of 250MB.
You can definitely execute arbitrary processes from your Node code in Lambda. We're doing this to run a game server that processes player turns, for example. I'm not sure exactly what you're trying to accomplish with Scrapy, but remember that your entire Lambda invocation can only live for a maximum of 60 seconds right now on AWS! But if that's OK for you, here is a completely working example of how we are executing our own arbitrary linux process from Lambda. (In our case, it's a compiled binary - really doesn't matter as long as you have something that can run on the Linux image they use).
var child_process = require('child_process');
var path = require('path');
exports.handler = function (event, context) {
// If timeout is provided in context, get it. Otherwise, assume 60 seconds
var timeout = (context.getRemainingTimeInMillis && (context.getRemainingTimeInMillis() - 1000)) || 60000;
// The task root is the directory with the code package.
var taskRoot = process.env['LAMBDA_TASK_ROOT'] || __dirname;
// The command to execute.
var command;
// Set up environment variables
process.env.HOME = '/tmp'; // <-- for naive processes that assume $HOME always works! You might not need this.
// On linux the executable is in task root / __dirname, whichever was defined
process.env.PATH += ':' + taskRoot;
command = 'bash -c "cp -R /var/task/YOUR_THING /tmp/; cd /tmp; ./YOUR_THING ARG1 ARG2 ETC"'
child_process.exec(command, {
timeout: timeout,
env: process.env
}, function (error, stdout, stderr) {
console.log(stdout);
console.log(stderr);
context.done(null, {exit: true, stdout: stdout, stderr: stderr});
});
};
Problem with your example is modifying node global variable process
by
var process = require('child_process');
This way you can't alter PATH environment variable, and thus the reason why you are getting Cannot read property 'PATH' of undefined
.
Just use different name for loaded child_process
library, e.g.
//this gets updated to child_process
var child_process = require('child_process');
var exec = child_process.exec;
//global process variable is still accessible
process.env['PATH'] = process.env['PATH'] + ':' +
process.env['LAMBDA_TASK_ROOT']
//start new process with your binary
exec('path/to/your/binary',......
Here is a Lambda function that runs a python script setting the current working directory to the same directory as the Lambda function. You may be able to use this with some modifications to the relative location of your python script.
var child_process = require("child_process");
exports.handler = function(event, context) {
var execOptions = {
cwd: __dirname
};
child_process.exec("python hello.py", execOptions, function (error, stdout, stderr) {
if (error) {
context.fail(error);
} else {
console.log("stdout:\n", stdout);
console.log("stderr:\n", stderr);
context.succeed();
}
});
};
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With