I've noticed a few inconsistencies when trying to use the <code>headerTemplate</code> and <code>footerTemplate</code> options with <code>page.pdf</code>: <ul> <li>The DPI for headers and footers seems to be lower (72 vs 96 for the main body, I think). So if I'm trying to match the margins, I have to scale by that.</li> <li>Styles are not shared with the main body so I have to include them in the template.</li> <li>If I try to use a locally stored font, it works on the main body but not in the header/footer even if I include the same CSS in the header/footer template.</li> </ul> I suspect that this happens because headers and footers are treated as separate documents and converted to image/pdf separately (https://cs.chromium.org/chromium/src/components/printing/resources/print_header_footer_template_page.html also implies something like that). Can someone familiar with the implementation explain how it actually works? Thanks!

<h3>Short Answer:</h3> Puppeteer controls Chrome or Chromium over the DevTools Protocol. Chromium uses Skia for PDF generation. Skia handles the header, set of objects, and footer separately. <hr> <h3>Detailed Answer:</h3> From the Puppeteer Documentation: <blockquote> <h3>page.pdf(options)</h3> <ul> <li> <code>options</code> <Object> Options object which might have the following properties: <ul> <li> <code>headerTemplate</code> <string> HTML template for the print header. Should be valid HTML markup with following classes used to inject printing values into them: <ul> <li> <code>date</code> formatted print date</li> <li> <code>title</code> document title</li> <li> <code>url</code> document location</li> <li> <code>pageNumber</code> current page number</li> <li> <code>totalPages</code> total pages in the document</li> </ul> </li> <li> <code>footerTemplate</code> <string> HTML template for the print footer. Should use the same format as the <code>headerTemplate</code>.</li> </ul> </li> <li>returns: <Promise<Buffer>> Promise which resolves with PDF buffer.</li> </ul> <blockquote> NOTE Generating a pdf is currently only supported in Chrome headless. </blockquote> <hr> <blockquote> NOTE <code>headerTemplate</code> and <code>footerTemplate</code> markup have the following limitations: <ol> <li>Script tags inside templates are not evaluated.</li> <li>Page styles are not visible inside templates.</li> </ol> </blockquote> </blockquote> <hr> We can learn from the the Puppeteer source code for <code>page.pdf()</code> that: <ul> <li>The Chrome DevTools Protocol method <code>Page.printToPDF</code> (along with the <code>headerTemplate</code> and <code>footerTemplate</code> parameters) are sent to to <code>page._client</code>.</li> <li> <code>page._client</code> is an instance of <code>page.target().createCDPSession()</code> (a Chrome DevTools Protocol session).</li> </ul> <hr> From the Chrome DevTools Protocol Viewer, we can see that <code>Page.printToPDF</code> contains the parameters <code>headerTemplate</code> and <code>footerTemplate</code>: <blockquote> <h3>Page.printToPDF</h3> Print page as PDF. PARAMETERS <ul> <li> <code>headerTemplate</code> string (optional) <ul> <li>HTML template for the print header. Should be valid HTML markup with following classes used to inject printing values into them: <ul> <li> <code>date</code>: formatted print date</li> <li> <code>title</code>: document title</li> <li> <code>url</code>: document location</li> <li> <code>pageNumber</code>: current page number</li> <li> <code>totalPages</code>: total pages in the document</li> </ul> </li> <li>For example, <code></code> would generate span containing the title.</li> </ul> </li> <li> <code>footerTemplate</code> string (optional) <ul> <li>HTML template for the print footer. Should use the same format as the <code>headerTemplate</code>.</li> </ul> </li> </ul> RETURN OBJECT <ul> <li> <code>data</code> string <ul> <li>Base64-encoded pdf data.</li> </ul> </li> </ul> </blockquote> <hr> The Chromium source code for <code>Page.printToPDF</code> shows us that: <ul> <li>The <code>Page.printToPDF</code> parameters are passed to the <code>sendDevToolsMessage</code> function, which issues a DevTools protocol command and returns a promise for the results.</li> </ul> <hr> After further digging, we can see that Chromium has a concrete implementation of a class called <code>SkDocument</code> that creates PDF files. <code>SkDocument</code> comes from the Skia Graphics Library, which Chromium uses for PDF generation. The Skia PDF Theory of Operation, in the PDF Objects and Document Structure section, states that: <blockquote> Background: The PDF file format has a header, a set of objects and then a footer that contains a table of contents for all of the objects in the document (the cross-reference table). The table of contents lists the specific byte position for each object. The objects may have references to other objects and the ASCII size of those references is dependent on the object number assigned to the referenced object; therefore we can’t calculate the table of contents until the size of objects is known, which requires assignment of object numbers. The document uses <code>SkWStream::bytesWritten()</code> to query the offsets of each object and build the cross-reference table. </blockquote> The document explains further down: <blockquote> The PDF backend requires all indirect objects used in a PDF to be added to the <code>SkPDFObjNumMap</code> of the <code>SkPDFDocument</code>. The catalog is responsible for assigning object numbers and generating the table of contents required at the end of PDF files. In some sense, generating a PDF is a three step process. In the first step all the objects and references among them are created (mostly done by <code>SkPDFDevice</code>). In the second step, <code>SkPDFObjNumMap</code> assigns and remembers object numbers. Finally, in the third step, the header is printed, each object is printed, and then the table of contents and trailer are printed. <code>SkPDFDocument</code> takes care of collecting all the objects from the various <code>SkPDFDevice</code> instances, adding them to an <code>SkPDFObjNumMap</code>, iterating through the objects once to set their file positions, and iterating again to generate the final PDF. </blockquote>

How does header and footer printing work in Puppeter's page.pdf API?

Tags:

puppeteer

I've noticed a few inconsistencies when trying to use the headerTemplate and footerTemplate options with page.pdf:

The DPI for headers and footers seems to be lower (72 vs 96 for the main body, I think). So if I'm trying to match the margins, I have to scale by that.
Styles are not shared with the main body so I have to include them in the template.
If I try to use a locally stored font, it works on the main body but not in the header/footer even if I include the same CSS in the header/footer template.

I suspect that this happens because headers and footers are treated as separate documents and converted to image/pdf separately (https://cs.chromium.org/chromium/src/components/printing/resources/print_header_footer_template_page.html also implies something like that). Can someone familiar with the implementation explain how it actually works? Thanks!

633

asked Jul 21 '18 16:07

Shrey

2 Answers

Short Answer:

Puppeteer controls Chrome or Chromium over the DevTools Protocol.

Chromium uses Skia for PDF generation.

Skia handles the header, set of objects, and footer separately.

Detailed Answer:

From the Puppeteer Documentation:

page.pdf(options)

options <Object> Options object which might have the following properties:

headerTemplate <string> HTML template for the print header. Should be valid HTML markup with following classes used to inject printing values into them:

date formatted print date

title document title

url document location

pageNumber current page number

totalPages total pages in the document

footerTemplate <string> HTML template for the print footer. Should use the same format as the headerTemplate.

returns: <Promise<Buffer>> Promise which resolves with PDF buffer.

NOTE Generating a pdf is currently only supported in Chrome headless.

NOTE headerTemplate and footerTemplate markup have the following limitations:

Script tags inside templates are not evaluated.

Page styles are not visible inside templates.

We can learn from the the Puppeteer source code for page.pdf() that:

The Chrome DevTools Protocol method Page.printToPDF (along with the headerTemplate and footerTemplate parameters) are sent to to page._client.
page._client is an instance of page.target().createCDPSession() (a Chrome DevTools Protocol session).

From the Chrome DevTools Protocol Viewer, we can see that Page.printToPDF contains the parameters headerTemplate and footerTemplate:

Page.printToPDF

Print page as PDF.

PARAMETERS

headerTemplate string (optional)

HTML template for the print header. Should be valid HTML markup with following classes used to inject printing values into them:

date: formatted print date

title: document title

url: document location

pageNumber: current page number

totalPages: total pages in the document

For example,  would generate span containing the title.

footerTemplate string (optional)

HTML template for the print footer. Should use the same format as the headerTemplate.

RETURN OBJECT

data string

Base64-encoded pdf data.

The Chromium source code for Page.printToPDF shows us that:

The Page.printToPDF parameters are passed to the sendDevToolsMessage function, which issues a DevTools protocol command and returns a promise for the results.

After further digging, we can see that Chromium has a concrete implementation of a class called SkDocument that creates PDF files.

SkDocument comes from the Skia Graphics Library, which Chromium uses for PDF generation.

The Skia PDF Theory of Operation, in the PDF Objects and Document Structure section, states that:

Background: The PDF file format has a header, a set of objects and then a footer that contains a table of contents for all of the objects in the document (the cross-reference table). The table of contents lists the specific byte position for each object. The objects may have references to other objects and the ASCII size of those references is dependent on the object number assigned to the referenced object; therefore we can’t calculate the table of contents until the size of objects is known, which requires assignment of object numbers. The document uses SkWStream::bytesWritten() to query the offsets of each object and build the cross-reference table.

The document explains further down:

The PDF backend requires all indirect objects used in a PDF to be added to the SkPDFObjNumMap of the SkPDFDocument. The catalog is responsible for assigning object numbers and generating the table of contents required at the end of PDF files. In some sense, generating a PDF is a three step process. In the first step all the objects and references among them are created (mostly done by SkPDFDevice). In the second step, SkPDFObjNumMap assigns and remembers object numbers. Finally, in the third step, the header is printed, each object is printed, and then the table of contents and trailer are printed. SkPDFDocument takes care of collecting all the objects from the various SkPDFDevice instances, adding them to an SkPDFObjNumMap, iterating through the objects once to set their file positions, and iterating again to generate the final PDF.

110

answered Oct 13 '22 05:10

Grant Miller

Thanks to the other answer (https://stackoverflow.com/a/51460641/364131) and codesearch, I think I found most of the answers I was looking for.

The printing implementation is in PrintPageInternal. It uses two separate WebFrames — one to render the content, and one to render the header and footer. The rendering for the header and footer is done by creating a special frame, writing the contents of print_header_and_footer_template_page.html to this frame, calling the setup function with the options provided and then printing to a shared canvas. After this, the rest of the contents of the page are printed on the same canvas within the bounds defined by the margins.

Headers and footers are scaled by a fudge_factor which isn't applied to the rest of the content. There might be something funny going on here with the DPIs (which might explain the fudge_factor of 1.33333333f which is equal to 96/72).

I'm guessing this special frame is what prevents the header and footer from sharing the same resources (styles, fonts etc.) as the contents of the page. It probably isn't setup to load (and wait for) any additional resources requested by the header and footer templates, which is why the requested fonts don't load.

answered Oct 13 '22 06:10

Shrey

Related questions
                            
                                C# API for puppeteer
                            
                                Trying to hide first footer/header on PDF generated with Puppeteer
                            
                                Puppeteer Use Multiple Proxies and Change Automatic Proxy if Proxy Refused Connection
                            
                                Getting the sibling of an elementHandle in Puppeteer
                            
                                Puppeteer TimeoutError: Navigation timeout of 30000 ms exceeded
                            
                                Intercept a certain request and get its response (puppeteer)
                            
                                Puppeteer not launching chromium in Mac 10.14
                            
                                Communicate "out" from Chromium via DevTools protocol
                            
                                docker alpine with node js and chromium headless - puppeter - failed to launch chrome
                            
                                Getting a Dynamic Element by Selector
                            
                                How to get children of elements by Puppeteer
                            
                                Puppeteer iframe contentFrame returns null
                            
                                How can I include mobile device details in headers when making a request?
                            
                                Puppeteer confirm
                            
                                console.log message is truncated
                            
                                Running pypupeteer in FLASK gives ValueError: signal only works in main thread
                            
                                waitForNavigation hanging, even though page was loaded
                            
                                Puppeteer - Removing elements by class
                            
                                Set Width and Height of Element Screenshot in Puppeteer
                            
                                Using Puppeteer, how can I open a page, get the data, then go back to the previous page to get the next page on the list?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does header and footer printing work in Puppeter's page.pdf API?

Tags:

puppeteer

Shrey

People also ask

2 Answers

Short Answer:

Detailed Answer:

page.pdf(options)

Page.printToPDF

Grant Miller

Shrey

Recent Activity

Donate For Us