Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ValueError: seek of closed file Working on PyPDF2 and getting this error

I am trying to get text out of a pdf file. Below is the code:

from PyPDF2 import PdfFileReader
with open('HTTP_Book.pdf', 'rb') as file:
    pdf = PdfFileReader(file)

page = pdf.getPage(1)
#print(dir(page))
print(page.extractText())

This gives me the error

ValueError: seek of closed file

I just put the code under the with statement, and it works fine. My question is: why is this so? I have already stored the information in 'pdf' object so i should be able to access it outside the block.

like image 461
Jeet Singh Avatar asked May 05 '19 11:05

Jeet Singh


People also ask

How do I fix ValueError IO operation on a closed file in Python?

The Python "ValueError: I/O operation on closed file" occurs when we try to perform an operation on a closed file. To solve the error, make sure to indent the code that tries to access the file correctly if using the with open() statement.

What does I O operation on closed file means?

The “ValueError : I/O operation on closed file” error is raised when you try to read from or write to a file that has been closed. If you are using a with statement, check to make sure that your code is properly indented.


1 Answers

PdfFileReader expects a seekable, open, steam. It does not load the entire file into memory, so you have to keep it open to run the methods, like getPage. Your hypothesis that creating a reader automatically reads in the whole file is incorrect.

A with statement operates on a context manager, such as a file. When the with ends, the context manager's __exit__ method is called. In this case, it closes the file handle that your PdfFildReader is trying to use to get the second page.

As you found out, the correct procedure is to read what you must from the PDF before you close the file. If, and only if, your program needs the PDF open until the very end, you can pass the file name directly to PdfFileReader. There is no (documented) way to close the file after that though, so I would recommend your original approach:

from PyPDF2 import PdfFileReader
with open('HTTP_Book.pdf', 'rb') as file:
    pdf = PdfFileReader(file)
    page = pdf.getPage(1)
    print(page.extractText())
# file is closed here, pdf will no longer do its job
like image 160
Mad Physicist Avatar answered Oct 17 '22 02:10

Mad Physicist