Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use the Python pdf2image module (thus poppler) on Google Cloud Function?

I tried convert PDF to JPEG on Google Cloud Functions. I used the Python module pdf2image. But I have no idea how to solve the errors No such file or directory: 'pdfinfo' and "Unable to get page count. Is poppler installed and in PATH? on the cloud function.

The error code is very similar to this question. pdf2image is a wrapper around "pdftoppm" and "pdftocairo" of poppler. But how can I install the poppler package on google cloud function, and add it to PATH? I can't find relevant references for it. It is even possible? If not, what could be done?

There is also this question, but it isn't useful.

The code look something like the following. Entry point is process_image.

import requests
from pdf2image import convert_from_path

def process_image(event, context):
    # Download sample pdf file
    url = 'https://www.adobe.com/support/products/enterprise/knowledgecenter/media/c4611_sample_explain.pdf'
    r = requests.get(url, allow_redirects=True)
    open('/tmp/sample.pdf', 'wb').write(r.content)

    # Error occur on this line
    pages = convert_from_path('/tmp/sample.pdf')

    # Save pages to /tmp
    for idx, page in enumerate(pages):
        output_file_path = f"/tmp/{str(idx)}.jpg"
        page.save(output_file_path, 'JPEG')
        # To be saved to cloud storage

Requirement.txt:

requests==2.25.1
pdf2image==1.14.0

This is the error code I get:

Traceback (most recent call last):
  File "/layers/google.python.pip/pip/lib/python3.8/site-packages/pdf2image/pdf2image.py", line 441, in pdfinfo_from_path
    proc = Popen(command, env=env, stdout=PIPE, stderr=PIPE)
  File "/opt/python3.8/lib/python3.8/subprocess.py", line 858, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/opt/python3.8/lib/python3.8/subprocess.py", line 1706, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'pdfinfo'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/layers/google.python.pip/pip/lib/python3.8/site-packages/functions_framework/__init__.py", line 149, in view_func
    function(data, context)
  File "/workspace/main.py", line 11, in process_image
    pages = convert_from_path('/tmp/sample.pdf')
  File "/layers/google.python.pip/pip/lib/python3.8/site-packages/pdf2image/pdf2image.py", line 97, in convert_from_path
    page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"]
  File "/layers/google.python.pip/pip/lib/python3.8/site-packages/pdf2image/pdf2image.py", line 467, in pdfinfo_from_path
    raise PDFInfoNotInstalledError(
pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?

Thanks in advance for any help.

like image 604
lamty101 Avatar asked Dec 19 '25 09:12

lamty101


1 Answers

This error occurs because poppler package doesn't work in Cloud Functions as it requires certain files written to the system. Unfortunately, you cannot write to file system in serverless products like Cloud Functions.

You may want to try methods, described in another thread, Cloud Functions for Firebase - Converting PDF to image or consider using GCP Compute Engine that has access to the whole system.

like image 180
Farid Shumbar Avatar answered Dec 21 '25 22:12

Farid Shumbar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!