Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google cloud function with wand stopped working

Tags:

I have set up 3 Google Cloud Storge buckets and 3 functions (one for each bucket) that will trigger when a PDF file is uploaded to a bucket. Functions convert PDF to png image and do further processing.

When I am trying to create a 4th bucket and similar function, strangely it is not working. Even if I copy one of the existing 3 functions, it is still not working and I am getting this error:

Traceback (most recent call last): File "/env/local/lib/python3.7/site-packages/google/cloud/functions_v1beta2/worker.py", line 333, in run_background_function _function_handler.invoke_user_function(event_object) File "/env/local/lib/python3.7/site-packages/google/cloud/functions_v1beta2/worker.py", line 199, in invoke_user_function return call_user_function(request_or_event) File "/env/local/lib/python3.7/site-packages/google/cloud/functions_v1beta2/worker.py", line 196, in call_user_function event_context.Context(**request_or_event.context)) File "/user_code/main.py", line 27, in pdf_to_img with Image(filename=tmp_pdf, resolution=300) as image: File "/env/local/lib/python3.7/site-packages/wand/image.py", line 2874, in __init__ self.read(filename=filename, resolution=resolution) File "/env/local/lib/python3.7/site-packages/wand/image.py", line 2952, in read self.raise_exception() File "/env/local/lib/python3.7/site-packages/wand/resource.py", line 222, in raise_exception raise e wand.exceptions.PolicyError: not authorized/tmp/tmphm3hiezy' @ error/constitute.c/ReadImage/412`

It is baffling me why same functions are working on existing buckets but not on new one.

UPDATE: Even this is not working (getting "cache resources exhausted" error):

In requirements.txt:

google-cloud-storage
wand

In main.py:

import tempfile

from google.cloud import storage
from wand.image import Image

storage_client = storage.Client()

def pdf_to_img(data, context):
    file_data = data
    pdf = file_data['name']

    if pdf.startswith('v-'):
        return 

    bucket_name = file_data['bucket']

    blob = storage_client.bucket(bucket_name).get_blob(pdf)

    _, tmp_pdf = tempfile.mkstemp()
    _, tmp_png = tempfile.mkstemp()

    tmp_png = tmp_png+".png"

    blob.download_to_filename(tmp_pdf)
    with Image(filename=tmp_pdf) as image:
        image.save(filename=tmp_png)

    print("Image created")
    new_file_name = "v-"+pdf.split('.')[0]+".png"
    blob.bucket.blob(new_file_name).upload_from_filename(tmp_png)

Above code is supposed to just create a copy of image file which is uploaded to bucket.

like image 253
Naveed Avatar asked Nov 14 '18 09:11

Naveed


2 Answers

Because the vulnerability has been fixed in Ghostscript but not updated in ImageMagick, the workaround for converting PDFs to images in Google Cloud Functions is to use this ghostscript wrapper and directly request the PDF conversion to png from Ghostscript (bypassing ImageMagick).

requirements.txt

google-cloud-storage
ghostscript==0.6

main.py

import locale
import tempfile
import ghostscript

from google.cloud import storage

storage_client = storage.Client()

def pdf_to_img(data, context):
    file_data = data
    pdf = file_data['name']

    if pdf.startswith('v-'):
        return 

    bucket_name = file_data['bucket']

    blob = storage_client.bucket(bucket_name).get_blob(pdf)

    _, tmp_pdf = tempfile.mkstemp()
    _, tmp_png = tempfile.mkstemp()

    tmp_png = tmp_png+".png"

    blob.download_to_filename(tmp_pdf)

    # create a temp folder based on temp_local_filename
    # use ghostscript to export the pdf into pages as pngs in the temp dir
    args = [
        "pdf2png", # actual value doesn't matter
        "-dSAFER",
        "-sDEVICE=pngalpha",
        "-o", tmp_png,
        "-r300", tmp_pdf
        ]
    # the above arguments have to be bytes, encode them
    encoding = locale.getpreferredencoding()
    args = [a.encode(encoding) for a in args]
    #run the request through ghostscript
    ghostscript.Ghostscript(*args)

    print("Image created")
    new_file_name = "v-"+pdf.split('.')[0]+".png"
    blob.bucket.blob(new_file_name).upload_from_filename(tmp_png)

Anyway, this gets you around the issue and keeps all the processing in GCF for you. Hope it helps. Your code works for single page PDFs though. My use-case was for multipage pdf conversion, ghostscript code & solution in this question.

like image 137
timhj Avatar answered Sep 24 '22 16:09

timhj


This actually seems to be a show stopper for ImageMagick related functionalities using PDF format. Similar code deployed by us on Google App engine via custom docker is failing with the same error on missing authorizations.

I am not sure how to edit the policy.xml file on GAE or GCF but a line there has to be changed to:

<policy domain="coder" rights="read|write" pattern="PDF" />

@Dustin: Do you have a bug link where we can see the progress ?

Update:

I fixed it on my Google app engine container by adding a line in docker image. This directly changes the policy.xml file content after imagemagick gets installed.

RUN sed -i 's/rights="none"/rights="read|write"/g' /etc/ImageMagick-6/policy.xml
like image 36
Hasan Rafiq Avatar answered Sep 26 '22 16:09

Hasan Rafiq