Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

loading only needed pages from the server when using pdf.js

I am using pdf.js to view pdf documents, pdf.js firstly fetches the whole document from the server then starts rendering, and this behavior causes two problems:

  • if the pdf document is large, it takes a long time to load.
  • cache memory leaks if someone reads from mobile devices.

I think using http range requests by requesting only the required pages the user browse, not the whole document, will solve these problems.

Here is a PR that implements range requests but the requests still running till the whole document loaded not with a fetch-as-you-go behavior. https://github.com/mozilla/pdf.js/pull/2719

any help?

like image 602
Mahmoud Felfel Avatar asked Dec 21 '25 16:12

Mahmoud Felfel


1 Answers

If memory serves me right on the PDF format, the actual document is not easily splittable unless you know the exact byte-range of each page before doing a call (and even then, I'm not sure JS can handle binary manipulation efficiently enough to do so, or if you're up to modifying the JS lib to do this). What you might want to do is to split your documents by page server-side (using PHP or another language), and instead of loading the entire doc using pdf.js, load the pages one by one.

This has some benefits and some drawbacks. The drawbacks:

  • You'll need to programmatically split the PDFs. This isn't actually that hard, but a bit tedious
  • You'll also need to find a way to pass all the URIs of the PDFs to your viewer. This is also pretty easy

The advantages should be obvious: less bandwidth usage, the ability to provide page-by-page view, ability to save individual pages.

like image 74
Sébastien Renauld Avatar answered Dec 23 '25 05:12

Sébastien Renauld