I am trying to read Pdf file in python through:
from PyPDF2 import PdfFileReader, PdfFileWriter
test_reader = PdfFileReader(file("test.pdf", "rb"))
Above Line throws error:
PyPDF2.utils.PdfReadError: Could not find xref table at specified location
Any help will be highly appreciated
It's fixed. Actually, there wasn't any problem. Seems, the pdf I was using to test was corrupted one (even though when I opened it, the content was there, which is why I couldn't figure out at first place)
I replaced it with another one and it worked as expected.
You could use qpdf to fix a corrupted PDF, or you could simply use pikepdf (which is based on qpdf) instead of PyPDF2. That library is able to work well with corrupted PDFs because it is based on qpdf.
Example:
import pikepdf
pdf = pikepdf.Pdf.open(file)
Pikepdf docs: https://pikepdf.readthedocs.io/en/latest/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With