I am using the python library PyPDF2 and trying to read a pdf file using PdfFileReader. It works fine for a local pdf file. Is there a way to access my pdf file from Google Cloud Storage bucket (gs://bucket_name/object_name)?
from PyPDF2 import PdfReader
with open('testpdf.pdf','rb') as f1:
reader = PdfReader(f1)
number_of_pages = len(reader.pages)
Instead of 'testpdf.pdf', how can I provide my Google Cloud Storage object location? Please let me know if anyone tried this.
You can use GCSFS library to access files from gcs bucket. For eg.
import gcsfs
from pypdf import PdfReader
gcs_file_system = gcsfs.GCSFileSystem(project="PROJECT_ID")
gcs_pdf_path = "gs://bucket_name/object.pdf"
f_object = gcs_file_system.open(gcs_pdf_path, "rb")
# Open our PDF file with the PdfReader
reader = PdfReader(f_object)
# Get number of pages
num = len(reader.pages)
f_object.close()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With