Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting text of Pdf on windows 8 using python 3.5.0

I want to extract texts from Pdf file using python 3.5.0 with the help of slate package on windows8.

Problem: Although I have installed slate package successfully still when i am trying to import slate there are certain errors.Please suggest what i am missing.

Errors:

Traceback (most recent call last): File "", line 1, in import slate File "C:\Users\name\AppData\Local\Programs\Python\Python35-32\lib\site-packages\slate-0.4.1-py3.5.egg\slate__init__.py", line 66, in from slate import PDF

ImportError: cannot import name 'PDF'

like image 271
B Singh Avatar asked Mar 15 '23 15:03

B Singh


1 Answers

You could try pdftotext (windows version) from the poppler library.

As a standalone program, it doesn't require Python. But I often use it from Python as a subprocess, like this:

import subprocess

args = ['pdftotext', '-layout', '-q', 'input.pdf', '-']
txt = subprocess.check_output(args, universal_newlines=True)
like image 129
Roland Smith Avatar answered Mar 24 '23 01:03

Roland Smith