Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ImageMagick not converting pdfs anymore in AWS Lambda

I've had a AWS Lambda function running on S3 objects for the last 18 months and it died around a month ago after a minor update. I've reverted it but it's still broken. I've looked into doing the most basic conversion of pdf using ImageMagick with no luck so I think AWS has updated something and caused the pdf module to either be removed or stop working.

I've done just the basic function I was basically doing in my core code in Node.js 8.10:

gm(response.Body).setFormat("png").stream((err, stdout,stderr) => {
  if (err) {
    console.log('broken');
  }
  const chunks = [];
  stdout.on('data', (chunk) => {
    chunks.push(chunk);
  });
  stdout.on('end', () => {
    console.log('gm done!');
  });
  stderr.on('data', (data) => {
    console.log('std error data ' + data);
  })
});

with the error response:

std error dataconvert: unable to load module `/usr/lib64/ImageMagick-6.7.8/modules-Q16/coders/pdf.la': file not found

I've also tried moving to Node.js 10.x and using the ImageMagick layer that's available through the aws serverless app repository. Trying this on the same code generates this error

std error data convert: FailedToExecuteCommand `'gs' -sstdout=%stderr -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 '-sDEVICE=pngalpha' -dTextAlphaBits=4 -dGraphicsAlphaBits=4 '-r72x72' '-sOutputFile=/tmp/magick-22TOeBgB4WrfoN%d' '-f/tmp/magick-22KvuEBeuJuyq3' '-f/tmp/magick-22dj24vSktMXsj'' (1) @ error/pdf.c/InvokePDFDelegate/292

In both cases the function works correctly when running on an image file instead.

Based on this I think both the aws 8.10 ImageMagick and the layer for 10 are missing the pdf module but I'm unsure how to add it or why it was removed in the first place. Whats the best way to fix this function that was working?

EDIT

So I've downloaded https://github.com/serverlesspub/imagemagick-aws-lambda-2 and built the library manually, uploaded it to Lambda and got it successfully working as a layer however it doesn't include GhostScript of which it is an optional library. I've tried to add it to Makefile_ImageMagick which builds and has some references to Ghostscript in the result but running it doesn't fix the PDF issue (images still work). Whats the best way to add the GhostScript optional library to the Make file?

like image 970
Rudiger Avatar asked Jul 17 '19 01:07

Rudiger


People also ask

Why won't ImageMagick convert to PDF?

The restricted policy is made to prevent unknown vulnerabilities coming from third party software as Ghostscript used here for PDF files. Be sure to update Ghostscript. Because of a known bug with security implications, the conversion to pdf is disabled in ImageMagick convert. Import the pictures into LibreOfffice and save the document.

Can ImageMagick run on AWS Lambda?

Scripts to compile ImageMagick utilities for AWS Lambda instances powered by Amazon Linux 2.x, such as the nodejs10.x or nodejs12.x or python 3.8 runtime, and the updated 2018.03 Amazon Linux 1 runtimes.

Why can't I convert my Ghostscript to PDF?

Be sure to update Ghostscript. Because of a known bug with security implications, the conversion to pdf is disabled in ImageMagick convert. Import the pictures into LibreOfffice and save the document. Export as pdf from LibreOffice.

Why we used serverless approach AWS Lambda for PDF generation?

To explain why we used Serverless approach AWS Lambda, for PDF generation, we will take an example of shipping invoice generation on each order delivery. CPU Utilization: On each order delivery, user is communicated with shipping invoice PDF. On an average there is about ~20-30K requests per day on order service for generating PDFs.


Video Answer


4 Answers

I had the same problem. Two cloud services processing thousands of PDF pages a day failing because of the pdf.la not found error.

The solution was to switch from Image Magick to GhostScript to convert PDFs to PNGs and then use ImageMagick with PNGs (if needed). This way, IM never has to deal with PDFs and wont need the pdf.la file.

To use GhostScript on AWS Lambda just upload the gs binary in the function zip file.

like image 158
José Augusto Paiva Avatar answered Oct 23 '22 19:10

José Augusto Paiva


While the other answers helped there was still a lot of work to get to a workable solution so below is how I managed to fix this, specifically for NodeJS.

Download: https://github.com/sina-masnadi/lambda-ghostscript

zip up the bin directory and upload it as a layer into Lambda.

Add https://github.com/sina-masnadi/node-gs to your NodeJS modules. You can either upload them as part of your project or the way I did it as a layer (along with all your other required ones).

Add https://github.com/serverlesspub/imagemagick-aws-lambda-2 as a layer. Best way to do this is to create a new function in Lambda, Select Browse serverless app repository, search for "ImageMagick" and select "image-magick-lambda-layer" (You can also build it and upload it as a layer too).

Add the three layers to your function, I've done it in this order

  1. GhostScript
  2. ImageMagick
  3. NodeJS modules

Add the appPath to the require statement for ImageMagick and GhostScript:

var gm = require("gm").subClass({imageMagick: true, appPath: '/opt/bin/'});
var gs = require('gs');

Mine was in an async waterfall so before my previous processing function I added this function to convert to a png if wasn't an image already:

  function convertIfPdf(response, next) {
    if (fileType == "pdf") {
      fs.writeFile("/tmp/temp.pdf", response.Body, function(err) {
        if (!err) {
          gs().batch().nopause().executablePath('/opt/bin/./gs').device('png16m').input("/tmp/temp.pdf").output('/tmp/temp.png').exec(function (err, stdout, stderr){
            if (!err && !stderr) {
              var data = fs.readFileSync('/tmp/temp.png');
              next(null, data);
            } else {
              console.log(err);
              console.log(stderr);
            }
          });
        }
      });
    } else {
      next(null, response.Body);
    }
  }

From then on you can do what you were previously doing in ImageMagick as it's in the same format. There may be better ways to do the pdf conversion but I was having issues with the GS library unless working with files. If there are better ways let me know.

If you are having issues loading the libraries make sure the path is correct, it is dependent on how you zipped it up.

like image 23
Rudiger Avatar answered Oct 23 '22 18:10

Rudiger


You can add a Layer to your lambda function to make it work again until the 22/07/2019. The ARN of the Layer that you need to add is the following : arn:aws:lambda:::awslayer:AmazonLinux1703

The procedure is described at upcoming-updates-to-the-aws-lambda-execution-environment

Any long term solution would be wonderful.

like image 1
Nicolas Oste Avatar answered Oct 23 '22 18:10

Nicolas Oste


I had the issue where ghostscript was no longer found.

Previously, I had referenced ghostscript via:

var gs = '/usr/bin/gs';

Since AWS lambda stopped providing that package, I went and included it directly into my lambda function which worked for me. I just downloaded the files from https://github.com/sina-masnadi/lambda-ghostscript and placed it in a folder called 'ghostscript' Then referenced it as so:

var path = require('path')
var gs = path.join(__dirname,"ghostscript","bin","gs")
like image 1
Raymond Nguyen Avatar answered Oct 23 '22 19:10

Raymond Nguyen