Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running pdftotext from Python

I am trying to convert a pdf document to text document using pdftotext software.

I need to call this application inc command prompt from python script to convert the file.

I have following code:

import os 
import subprocess

path = "C:\\Users\\..." 
pdffname = "pdffilename.pdf" 
txtfname = "txtfilename.txt"

subprocess.call(['pdftotext', '-layout', 
     os.path.join(path, pdffname),
     os.path.join(path, txtfname)])

When I run this code, I get error

  File "C:/Users/.../code-1.py", line 44, in <module>
    os.path.join(path, txtfname)])

  File "C:\Anaconda\lib\subprocess.py", line 522, in call
    return Popen(*popenargs, **kwargs).wait()

  File "C:\Anaconda\lib\subprocess.py", line 710, in __init__
    errread, errwrite)

  File "C:\Anaconda\lib\subprocess.py", line 958, in _execute_child
    startupinfo)

WindowsError: [Error 2] The system cannot find the file specified

Can you help to call pdftotext application from python to convert pdf to text file.

like image 607
annamalai muthuraman Avatar asked Jun 15 '26 23:06

annamalai muthuraman


1 Answers

I had this same error, except with Popen. I fixed it by providing the full path to pdftotext.exe in the subprocess call. Don't forget to escape your backslashes.

I do not know much about Anaconda, and I have not tested this myself, but I believe Conda may have an issue referencing scripts on Windows: fix references to scripts on windows

like image 89
astrimbu Avatar answered Jun 17 '26 12:06

astrimbu