Possible Duplicate:
How do I Index PDF files and search for keywords?
Create an index out of a PDF.
I think you can use pyPdf Python library for this(http://pybrary.net/pyPdf/). This code show numbers of pages which include required word:
from pyPdf import PdfFileReader
input = PdfFileReader(file("YourPDFFile.pdf", "rb"))
numberOfPages = input.getNumPages()
i = 1
while i < numberOfPages:
oPage = input.getPage(i)
text = oPage.extractText()
text.encode('utf8', 'ignore')
if text.find('What are you looking for') != -1:
print i
i += 1
The same but working with Python 3
from pyPdf import PdfFileReader
input = PdfFileReader(open("YourPDFFile.pdf", "rb"))
numberOfPages = input.getNumPages()
i = 1
while i < numberOfPages:
oPage = input.getPage(i)
text = oPage.extractText()
text.encode('utf8', 'ignore')
if text.find('What are you looking for') != -1:
print(i)
i += 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With