Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading pdf files line by line using python

Tags:

python

pypdf2

I used the following code to read the pdf file, but it does not read it. What could possibly be the reason?

from PyPDF2 import PdfFileReader

reader = PdfFileReader("example.pdf")
contents = reader.pages[0].extractText().split("\n")
print(contents)

The output is [u''] instead of reading the content.

like image 863
Rahul Pipalia Avatar asked Jul 08 '17 04:07

Rahul Pipalia


1 Answers

import re
from PyPDF2 import PdfFileReader

reader = PdfFileReader("example.pdf")

for page in reader.pages:
    text = page.extractText()
    text_lower = text.lower()
    for line in text_lower:
        if re.search("abc", line):
            print(line)

I use it to iterate page by page of pdf and search for key terms in it and process further.

like image 106
Piyush Rumao Avatar answered Nov 01 '22 09:11

Piyush Rumao