Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

google-cloud-vision how to read pdf file

I am using Google OCR API and I am reading both images and PDF files, I am able to read and process images file, however, for PDF files, as per Google OCR API documentation, they have mentioned that we need to store our document into Google Cloud service.

Having said that, due to data confidentiality, I can't store my data into Google Cloud and want to upload my PDF from my local system in order to read text from PDF file. Is it possible to upload PDF from local disk and then process it instead of uploading file into Google Cloud?

like image 212
ZeeKhan Avatar asked Aug 24 '18 01:08

ZeeKhan


People also ask

How can I extract text from a PDF for free?

With the help of Optical Character Recognition (OCR), you can extract any text from a PDF document into a simple text file. And it's simple: just upload your PDF and let us do the rest. After you provided your file, PDF2Go will use OCR to get the text from your PDF and save it as a TXT file.

How do I use OCR in Google Docs?

Img to Docs allows you to quickly and easily convert images to text within a Google Doc. Simply drag and drop your image or click to upload and watch as Optical Character Recognition (OCR) is automatically applied to extract your text.


1 Answers

As you said, it's not possible to do that locally. I filed a Feature Request [1] on your behalf for you to follow updates there.

Anyway, I have a possible workaround that might satisfy your data confidentiality awareness. It consist in using the Cloud Storage Client libraries [2] to both upload and delete those files:

  1. You have the PDF file locally and no buckets containing it.
  2. Upload it to a bucket [3]
  3. Use that bucket+file URI to read it through Cloud Vision API and store the result in a bucket
  4. Download the result file into your local machine [4]
  5. Delete both the PDF file and the result file from the bucket(s) [5]

This should work as long as you don't mind having those files in buckets for a brief period of time.

like image 135
Iñigo Avatar answered Sep 22 '22 15:09

Iñigo