Reading pdf files line by line using python

Question

I used the following code to read the pdf file, but it does not read it. What could possibly be the reason?

from PyPDF2 import PdfFileReader

reader = PdfFileReader("example.pdf")
contents = reader.pages[0].extractText().split("
")
print(contents)

The output is [u''] instead of reading the content.

Piyush Rumao · Accepted Answer

import re
from PyPDF2 import PdfFileReader

reader = PdfFileReader("example.pdf")

for page in reader.pages:
    text = page.extractText()
    text_lower = text.lower()
    for line in text_lower:
        if re.search("abc", line):
            print(line)

I use it to iterate page by page of pdf and search for key terms in it and process further.

Reading pdf files line by line using python

Tags:

python

pypdf2

Rahul Pipalia

1 Answers

Piyush Rumao

Recent Activity

Donate For Us

Reading pdf files line by line using python

Tags:

python

pypdf2

Rahul Pipalia

1 Answers

Piyush Rumao

Related questions

Recent Activity

Donate For Us