Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PdfFileReader: PdfReadError: Could not find xref table at specified location

Tags:

python

pypdf2

I am trying to read Pdf file in python through:

from PyPDF2 import PdfFileReader, PdfFileWriter
test_reader = PdfFileReader(file("test.pdf", "rb"))

Above Line throws error:

PyPDF2.utils.PdfReadError: Could not find xref table at specified location

Any help will be highly appreciated

like image 401
Nitin Bhojwani Avatar asked Dec 05 '15 12:12

Nitin Bhojwani


2 Answers

It's fixed. Actually, there wasn't any problem. Seems, the pdf I was using to test was corrupted one (even though when I opened it, the content was there, which is why I couldn't figure out at first place)

I replaced it with another one and it worked as expected.

like image 84
Nitin Bhojwani Avatar answered Sep 28 '22 04:09

Nitin Bhojwani


You could use qpdf to fix a corrupted PDF, or you could simply use pikepdf (which is based on qpdf) instead of PyPDF2. That library is able to work well with corrupted PDFs because it is based on qpdf.

Example:

import pikepdf
pdf = pikepdf.Pdf.open(file)

Pikepdf docs: https://pikepdf.readthedocs.io/en/latest/

like image 30
Wesley - Synio Avatar answered Sep 28 '22 06:09

Wesley - Synio