calling pdftotext from python script not working when I change from local machine to my webhosting

Question

I wrote a small python script to parse/extract info from a PDF. I tested it on my local machine, I have python 2.6.2 and pdftotext version 0.12.4.

I am trying to run this on my webhosting server (dreamhost). It has python version 2.5.2 and pdftotext version 3.02.

But when I try to run the script I get the following error at the pdftotext line ( I have checked it with a simple throw away script as well) "Error: Couldn't open file '-'"

def ConvertPDFToText(currentPDF):
    pdfData = currentPDF.read()

    tf = os.tmpfile()
    tf.write(pdfData)
    tf.seek(0)

    if (len(pdfData) > 0) :
        out, err = subprocess.Popen(["pdftotext", "-layout", "-", "-"], stdin = tf, stdout=subprocess.PIPE ).communicate()
        return out
    else :
        return None

Note that I am pass this function the same PDF file and it does have access to it. In another function I can email myself the PDF document from the same script running on the webhost.

What am I doing wrong? What is the possible difference in usage for subprocess/python/pdftext between my local version and the webhost version? I am guessing I will have to modify the command, so any help would be greatly appreciated.

Thanks in advance.

Chaitanya · Accepted Answer

The hint for the answer lay in Noufal's comment, to use the filename. But the os.tmpfile() doesn't have a filename. I had to use another module. The modified code is given below.

#import tempfile
def ConvertPDFToText(currentPDF):
    pdfData = currentPDF.read()

    tf = tempfile.NamedTemporaryFile()
    tf.write(pdfData)
    tf.seek(0)

    outputTf = tempfile.NamedTemporaryFile()

    if (len(pdfData) > 0) :
        out, err = subprocess.Popen(["pdftotext", "-layout", tf.name, outputTf.name ]).communicate()
        return outputTf.read()
    else :
        return None

I am not sure sure how to give Noufal's comment the points for this answer though. Perhaps he can cut and paste this answer?

calling pdftotext from python script not working when I change from local machine to my webhosting

Tags:

python

subprocess

scripting

dreamhost

pdftotext

Chaitanya

1 Answers

Chaitanya

Recent Activity

Donate For Us

calling pdftotext from python script not working when I change from local machine to my webhosting

Tags:

python

subprocess

scripting

dreamhost

pdftotext

Chaitanya

1 Answers

Chaitanya

Related questions

Recent Activity

Donate For Us