Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Chrome extension: How to show custom UI for a PDF file?

I'm trying to write a Google Chrome extension for showing PDF files. As soon as I detect that browser is redirecting to a URL pointing to a PDF file, I want it to stop loading the default PDF viewer, but start showing my UI instead. The UI will use PDF.JS to render the PDF and jQuery-ui to show some other stuff.

Question: how do I make this? It's very important to block the original PDF viewer, because I don't want to double memory consumption by showing two instance of the document. Therefore, I should somehow navigate the tab to my own view.

like image 683
Igor Soloydenko Avatar asked Jan 04 '15 21:01

Igor Soloydenko


People also ask

How do I change the Chrome logo to a PDF file?

You may have to click on Additional Content Settings on the bottom to make the PDF Documents option appear. An option titled "Download PDF files instead of automatically opening them in Chrome" appears. Click on the slider at the right to turn it on.

Does Chrome have a built-in PDF viewer?

The Google Chrome browser comes with built-in PDF reader functionality for easy viewing. The Adobe Acrobat extension for Google Chrome gives you additional features, including file conversion. The Acrobat extension helps save space by accessing documents from anywhere via your browser.

Can I set the filename of a PDF object displayed in Chrome?

In Chrome, the filename is derived from the URL, so as long as you are using a blob URL, the short answer is "No, you cannot set the filename of a PDF object displayed in Chrome." You have no control over the UUID assigned to the blob URL and no way to override that as the name of the page using the object element.

Why do my PDF files have a Chrome logo?

Sometimes even when setting Adobe Acrobat DC as the Default, downloaded PDFs will open in Chrome instead. This is because Chrome is set to use it's integrated PDF viewer when files are downloaded by default. You will need to turn this off to make it go away. To turn this feature off, follow the steps below.


2 Answers

As the main author of the PDF.js Chrome extension, I can share some insights about the logic behind building a PDF Viewer extension for Chrome.

How to detect a PDF file?

In a perfect world, every website would serve PDF files with the standard application/pdf MIME-type. Unfortunately, the real world is not perfect, and in practice there are many websites which use an incorrect MIME-type. You will catch the majority of the cases by selecting requests that satisfy any of the following conditions:

  • The resource is served with the Content-Type: application/pdf response header.
  • The resource is served with the Content-Type: application/octet-stream response header, and its URL contains ".pdf" (case-insensitive).

Besides that, you also have to detect whether the user wants to view the PDF file or download the PDF file. If you don't care about the distinction, it's easy: Just intercept the request if it matches any of the previous conditions.
Otherwise (and this is the approach I've taken), you need to check whether the Content-Disposition response header exists and its value starts with "attachment".

If you want to support PDF downloads (e.g. via your UI), then you need to add the Content-Disposition: attachment response header. If the header already exists, then you have to replace the existing disposition type (e.g. inline) with "attachment". Don't bother with trying to parse the full header value, just strip the first part up to the first semicolon, then put "attachment" in front of it. (If you really want to parse the header, read RFC 2616 (section 19.5.1) and RFC 6266).

Which Chrome (extension) APIs should I use to intercept PDF files?

The chrome.webRequest API can be used to intercept and redirect requests. With the following logic, you can intercept and redirect PDFs to your custom viewer that requests the PDF file from the given URL.

chrome.webRequest.onHeadersReceived.addListener(function(details) {
    if (/* TODO: Detect if it is not a PDF file*/)
        return; // Nope, not a PDF file. Ignore this request.

    var viewerUrl = chrome.extension.getURL('viewer.html') +
      '?file=' + encodeURIComponent(details.url);
    return { redirectUrl: viewerUrl };
}, {
    urls: ["<all_urls>"],
    types: ["main_frame", "sub_frame"]
}, ["responseHeaders", "blocking"]);

(see https://github.com/mozilla/pdf.js/blob/master/extensions/chromium/pdfHandler.js for the actual implementation of the PDF detection using the logic described at the top of this answer)

With the above code, you can intercept any PDF file on http and https URLs. If you want to view PDF files from the local filesystem and/or ftp, then you need to use the chrome.webRequest.onBeforeRequest event instead of onHeadersReceived. Fortunately, you can assume that if the file ends with ".pdf", then the resource is most likely a PDF file. Users who want to use the extension to view a local PDF file have to explicitly allow this at the extension settings page though.

On Chrome OS, use the chrome.fileBrowserHandler API to register your extension as a PDF Viewer (https://github.com/mozilla/pdf.js/blob/master/extensions/chromium/pdfHandler-vcros.js).

The methods based on the webRequest API only work for PDFs in top-level documents and frames, not for PDFs embedded via <object> and <embed>. Although they are rare, I still wanted to support them, so I came up with an unconventional method to detect and load the PDF viewer in these contexts. The implementation can be viewed at https://github.com/mozilla/pdf.js/pull/4549/files. This method relies on the fact that when an element is put in the document, it eventually have to be rendered. When it is rendered, CSS styles get applied. When I declare an animation for the embed/object elements in the CSS, animation events will be triggered. These events bubble up in the document. I can then add a listener for this event, and replace the content of the object/embed element with an iframe that loads my PDF Viewer.
There are several ways to replace an element or content, but I've used Shadow DOM to change the displayed content without affecting the DOM in the page.

Limitations and notes

The method described here has a few limitations:

  • The PDF file is requested at least two times from the server: First a usual request to get the headers, which gets aborted when the extension redirects to the PDF Viewer. Then another request to request the actual data.
    Consequently, if a file is valid only once, then the PDF cannot be displayed (the first request invalidates the URL and the second request fails).

  • This method only works for GET requests. There is no public API to directly get response bodies from a request in a Chrome extension (crbug.com/104058).

  • The method to get PDFs to work for <object> and <embed> elements requires a script to run on every page. I've profiled the code and found that the impact on performance is negligible, but you still need to be careful if you want to change the logic.
    (I first tried to use Mutation Observers for detection, which slowed down the page load by 3-20% on huge documents, and caused an additional 1.5 GB peak in memory usage in a complex DOM benchmark).

  • The method to detect <object> / <embed> tags might still cause any NPAPI/PPAPI-based PDF plugins to load, because it only replaced the <embed>/<object> tag's content when it has already been inserted and rendered. When a tab is inactive, animations are not scheduled, and hence the dispatch of the animation event will significantly be delayed.

Afterword

PDF.js is open-source, you can view the code for the Chrome extension at https://github.com/mozilla/pdf.js/tree/master/extensions/chromium. If you browse the source, you'll notice that the code is a bit more complex than I explained here. That's because extensions cannot redirect requests at the onHeadersReceived event until I implemented it a few months ago (crbug.com/280464, Chrome 35).

And there is also some logic to make the URL in the omnibox look a bit better.

The PDF.js extension continues to evolve, so unless you want to significantly change the UI of the PDF Viewer, I suggest to ask users to install the PDF.js's official PDF Viewer in the Chrome Web Store, and/or open issues on PDF.js's issue tracker for reasonable feature requests.

like image 57
Rob W Avatar answered Oct 23 '22 22:10

Rob W


Custom PDF Viewer

Basically, to accomplish what you want to do you'll need to:

  1. Interject the PDF's URL when it's loaded;
  2. Stop the PDF from loading;
  3. Start your own PDF viewer and load the PDF inside it.

How to

  1. Using the chrome.webRequest API you can easily listen to the web requests made by Chrome, and, more specifically, the ones that are going to load .pdf files. Using the chrome.webRequest.onBeforeRequest event you can listen to all the requests that end with ".pdf" and get the URL of the requested resource.

  2. Create a page, for example display_pdf.html where you will show the PDFs and do whatever you want with them.

  3. In the chrome.webRequest.onBeforeRequest listener, prevent the resource from being loaded returning {redirectUrl: ...} to redirect to your display_pdf.html page.

  4. Pass the PDF's URL to your page. This can be done in several ways, but, for me, the simplest one is to add the encoded PDF URL at the end of your page's url, like an encoded query string, something like display_pdf.html?url=http%3A%2F%2Fwww.example.com%2Fexample.pdf.

  5. Inside the page, get the URL with JavaScript and process and render the PDF with any library you want, like PDF.js.

The code

Following the above steps, your extension will look like this:

<root>/
    /background.js
    /display_pdf.html
    /display_pdf.js
    /manifest.json

So, first of all, let's look at the manifest.json file: you will need to declare the permissions for webRequest and webRequestBlocking, so it should look like this:

{
    "manifest_version": 2,

    "name": "PDF Test",
    "version": "0.0.1",

    "background": {
        "scripts": ["/background.js"] 
    },

    "permissions": ["webRequest", "webRequestBlocking", "<all_urls>"],
}

Then, in your background.js you will listen to the chrome.webRequest.onBeforeRequest event and update the tab which is loading the PDF with the URL of your custom display_pdv.html page, like this:

chrome.webRequest.onBeforeRequest.addListener(function(details) {
    var displayURL;

    if (/\.pdf$/i.test(details.url)) { // if the resource is a PDF file ends with ".pdf"
        displayURL = chrome.runtime.getURL('/display_pdf.html') + '?url=' + encodeURIComponent(details.url);

        return {redirectUrl: displayURL};
        // stop the request and proceed to your custom display page
    }   
}, {urls: ['*://*/*.pdf']}, ['blocking']);

And finally, in your display_pdf.js file you will extract the PDF's url from the query string and use it to do whatever you want:

var PDF_URL = decodeURIComponent(location.href.split('?url=')[1]);
// this will be something like http://www.somesite.com/path/to/example.pdf

alert('The PDF url is: ' + PDF_URL);
// do something with the pdf... like processing it with PDF.js

Working Example

A working example of what I said above can be found HERE.

Documentation links

I recommend you to take a look at the official documentation of the above specified APIs, that you can find following these links:

  • chrome.webRequest API
    • chrome.webRequest.onBeforeRequest event
  • chrome.runtime API
    • chrome.runtime.getURL method
like image 41
Marco Bonelli Avatar answered Oct 23 '22 22:10

Marco Bonelli