Open Puppeteer with specific configuration (download PDF instead of PDF viewer)

2 Answers

There is no option you can pass into Puppeteer to force PDF downloads. However, you can use chrome-devtools-protocol to add a content-disposition: attachment response header to force downloads.

A visual flow of what you need to do:

cdp-modify-response-header (2)

I'll include a full example code below. In the example below, PDF files and XML files will be downloaded in headful mode.

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    headless: false,
    defaultViewport: null, 
  });

  const page = await browser.newPage();

  const client = await page.target().createCDPSession();

  await client.send('Fetch.enable', {
    patterns: [
      {
        urlPattern: '*',
        requestStage: 'Response',
      },
    ],
  });

  await client.on('Fetch.requestPaused', async (reqEvent) => {
    const { requestId } = reqEvent;

    let responseHeaders = reqEvent.responseHeaders || [];
    let contentType = '';

    for (let elements of responseHeaders) {
      if (elements.name.toLowerCase() === 'content-type') {
        contentType = elements.value;
      }
    }

    if (contentType.endsWith('pdf') || contentType.endsWith('xml')) {

      responseHeaders.push({
        name: 'content-disposition',
        value: 'attachment',
      });

      const responseObj = await client.send('Fetch.getResponseBody', {
        requestId,
      });

      await client.send('Fetch.fulfillRequest', {
        requestId,
        responseCode: 200,
        responseHeaders,
        body: responseObj.body,
      });
    } else {
      await client.send('Fetch.continueRequest', { requestId });
    }
  });

  await page.goto('https://pdf-xml-download-test.vercel.app/');

  await page.waitFor(100000);

  await client.send('Fetch.disable');

  await browser.close();
})();

For a more detailed explanation, please refer to the Git repo I've setup with comments. It also includes an example code for playwright.

answered Nov 02 '22 23:11

subwaymatch

Puppeteer currently does not support navigating (or downloading) PDFs in headless mode that easily. Quote from the docs for the page.goto function:

NOTE Headless mode doesn't support navigation to a PDF document. See the upstream issue.

What you can do though, is detect if the browser is navigating to the PDF file and then download it yourself via Node.js.

Code sample

const puppeteer = require('puppeteer');
const http = require('http');
const fs = require('fs');

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    page.on('request', req => {
        if (req.url() === '...') {
            const file = fs.createWriteStream('./file.pdf');
            http.get(req.url(), response => response.pipe(file));
        }
    });

    await page.goto('...');
    await browser.close();
})();

This navigates to a URL and monitors the ongoing requests. If the "matched request" is found, Node.js will manually download the file via http.get and pipe it into file.pdf. Please be aware that this is a minimal working example. You want to catch errors when downloading and might also want to use something more sophisticated then http.get depending on the situation.

Future note

In the future, there might be an easier way to do it. When puppeteer will support response interception, you will be able to simply force the browser to download a document, but right now this is not supported (May 2019).

answered Nov 02 '22 22:11

Thomas Dondorf

Related questions
                            
                                Is the Promise argument passed into Knex migrations needed?
                            
                                invoke a child process via fork() when using ts-node
                            
                                Create-react-app failing with error: node incompatible with css-loader
                            
                                Can garbage collection happen while the main thread is busy?
                            
                                Gremlin, javascript: where is the function "valueMap()" imported from?
                            
                                What's a valid @MessagePattern for NestJS MQTT microservice?
                            
                                Set the server port for sending API requests from Angular to NodeJS in development
                            
                                Use async forEach loop while fetching data from firestore
                            
                                Using async/await still returns undefined
                            
                                Upload a file in React and send it to an Express server
                            
                                Returning the line of a searched text from a file using node.js
                            
                                ReactJS Cannot find babel-preset-es2015 on npm start
                            
                                How to delay/start/debounce fetching data until user stops typing?
                            
                                NodeJS How to import JS file into TypeScript
                            
                                Laravel Mix HMR not updating after compiling
                            
                                How to make Jest log the entire error object?
                            
                                How to require a nested json object in Mongoose Schema
                            
                                Yarn throws Error: Cannot find module 'decamelize'
                            
                                Unexpected end of JSON input while run npm install [closed]
                            
                                I'm trying to download a pdf file from a node server to a react client but when I open it, it shows blank

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Open Puppeteer with specific configuration (download PDF instead of PDF viewer)

Tags:

node.js

puppeteer

Jeck

People also ask

2 Answers

subwaymatch

Future note

Thomas Dondorf

Recent Activity

Donate For Us