Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google drive whats the limit to indexing large files?

I'm using the google drive api to store and retrieve pdf files. I would like to query these files using the search parameters.

But before I start implementing this. I would like to know how google handles the indexing of large pdf files. (600+pages 25Mb+) I would like to know for text based pdf's.(they don't need ocr)

I've tried some searches on the drive website and it doesn't always work.

I would like to know if the are any limitations and what they are.

like image 826
DavidVdd Avatar asked Aug 28 '12 14:08

DavidVdd


1 Answers

According to this page for PDFs with OCR:

The maximum size for images (.jpg, .gif, .png) and PDF files (.pdf) is 2 MB. For PDF files, we only look at the first 10 pages when searching for text to extract.

And this page for PDFs with text:

You can search for text in PDF and image files by:

  • Typing a query in the search box in Google Drive on the web.
  • Opening the Google Drive viewer and using the search box in the upper right.

In theory you should be able to search the first 100 pages of any text documents or text-based PDFs that you've uploaded. You'll also be able to search for text found on the first ten pages of any image PDFs on your Drive.

like image 141
Jason Sperske Avatar answered Oct 01 '22 10:10

Jason Sperske