Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get text from PDF in Google

I have a PDF document that is saved in Google Drive. I can use the Google Drive Web UI search to find text in the document.

How can I programmatically extract a portion of the text in the document using Google Apps Script?

like image 610
goalguy Avatar asked Mar 12 '23 19:03

goalguy


1 Answers

See pdfToText() in this gist.

To invoke the OCR built in to Google Drive on a PDF file, e.g. myPDF.pdf, here is what you do:

function myFunction() {
  var pdfFile = DriveApp.getFilesByName("myPDF.pdf").next();
  var blob = pdfFile.getBlob();

  // Get the text from pdf
  var filetext = pdfToText( blob, {keepTextfile: false} );

  // Now do whatever you want with filetext...
}
like image 90
Mogsdad Avatar answered Mar 20 '23 13:03

Mogsdad