Selenium print PDF in A4 format

Tags:

I have the following code for printing to PDF (and it works), and I am using only Google Chrome for printing.

def send_devtools(driver, command, params=None):
    # pylint: disable=protected-access
    if params is None:
        params = {}
    resource = "/session/%s/chromium/send_command_and_get_result" % driver.session_id
    url = driver.command_executor._url + resource
    body = json.dumps({"cmd": command, "params": params})
    resp = driver.command_executor._request("POST", url, body)
    return resp.get("value")


def export_pdf(driver):
    command = "Page.printToPDF"
    params = {"format": "A4"}
    result = send_devtools(driver, command, params)
    data = result.get("data")
    return data

As we can see, I am using Page.printToPDF to print to base64, and passing "A4" as format on params paramenter.

Unfortunately this parameter seems to be being ignored. I saw some code using puppeteer using it (format A4) and I thought that could help me.

Even with hardcoded width and height (see bellow) I have no luck.

Click to copy

"paperWidth": 8.27,  # inches
"paperHeight": 11.69,  # inches

Using the code above, is it possible to set the page to A4 format?

358

asked Jun 28 '21 13:06

Rodrigo

1 Answers

UPDATED POST 07-17-2021

I decided to verify the output of my original code using the Python package pdfminer.sixth

Click to copy

from pdfminer.pdfpage import PDFPage
from pdfminer.pdfpage import PDFParser
from pdfminer.pdfpage import PDFDocument

parser = PDFParser(open('test_1.pdf', 'rb'))
doc = PDFDocument(parser)
pageSizesList = []
for page in PDFPage.create_pages(doc):
    print(page.mediabox)
    # output
    [0, 0, 612, 792]

I was shocked when I converted these point sizes to inches. The size was 8.5 x 11, which doesn’t equal the A4 paper size of 8.27 x 11.69. When I saw this I decided to explore this issue more, by looking through the chromium and selenium source code.

Within the chromium source code the command Page.printToPDF is located in the file page_handler.cc

Click to copy

void PageHandler::PrintToPDF(Maybe<bool> landscape,
                             Maybe<bool> display_header_footer,
                             Maybe<bool> print_background,
                             Maybe<double> scale,
                             Maybe<double> paper_width,
                             Maybe<double> paper_height,
                             Maybe<double> margin_top,
                             Maybe<double> margin_bottom,
                             Maybe<double> margin_left,
                             Maybe<double> margin_right,
                             Maybe<String> page_ranges,
                             Maybe<bool> ignore_invalid_page_ranges,
                             Maybe<String> header_template,
                             Maybe<String> footer_template,
                             Maybe<bool> prefer_css_page_size,
                             Maybe<String> transfer_mode,
                             std::unique_ptr<PrintToPDFCallback> callback)

This function allows the parameters paper_width and paper_height to be modified. These parameters take a double. A C++ double is a versatile data type that is used internally for the compiler to define and hold any numerically valued data type especially any decimal oriented value. C++ double data type can be either fractional as well as whole numbers with values.

These parameters have default values, which are defined in the Chrome DevTools Protocol:

paperWidth: Paper width in inches. Defaults to 8.5 inches.
paperHeight: Paper height in inches. Defaults to 11 inches

Note the discrepancy between the format of the parameters between chromium source code and the Chrome DevTools Protocol details.

paper_width in the chromium source code
paperWidth in the Chrome DevTools Protocol

According to the chromium source code the command Page.printToPDF is called with SendCommandAndGetResultWithTimeout.

Click to copy

Status WebViewImpl::PrintToPDF(const base::DictionaryValue& params,
                               std::string* pdf) {
  // https://bugs.chromium.org/p/chromedriver/issues/detail?id=3517
  if (!browser_info_->is_headless) {
    return Status(kUnknownError,
                  "PrintToPDF is only supported in headless mode");
  }
  std::unique_ptr<base::DictionaryValue> result;
  Timeout timeout(base::TimeDelta::FromSeconds(10));
  Status status = client_->SendCommandAndGetResultWithTimeout(
      "Page.printToPDF", params, &timeout, &result);
  if (status.IsError()) {
    if (status.code() == kUnknownError) {
      return Status(kInvalidArgument, status);
    }
    return status;
  }
  if (!result->GetString("data", pdf))
    return Status(kUnknownError, "expected string 'data' in response");
  return Status(kOk);
}

In my original answer I used send_command_and_get_result, which is similar to the command SendCommandAndGetResultWithTimeout.

Click to copy

# stub_devtools_client.h
 
Status SendCommandAndGetResult(
     const std::string& method,
     const base::DictionaryValue& params,
     std::unique_ptr<base::DictionaryValue>* result) override;

Status SendCommandAndGetResultWithTimeout(
     const std::string& method,
     const base::DictionaryValue& params,
     const Timeout* timeout,
     std::unique_ptr<base::DictionaryValue>* result) override;

After looking at the selenium source code it 's unclear how to correctly pass the commands send_command_and_get_result or send_command_and_get_result_with_timeout.

I did note this function in the webdriver selenium source code:

Click to copy

def execute_cdp_cmd(self, cmd, cmd_args):
     """
     Execute Chrome Devtools Protocol command and get returned result

     The command and command args should follow chrome devtools protocol domains/commands, refer to link
     https://chromedevtools.github.io/devtools-protocol/

     :Args:
      - cmd: A str, command name
      - cmd_args: A dict, command args. empty dict {} if there is no command args

     :Usage:
         driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': requestId})

     :Returns:
         A dict, empty dict {} if there is no result to return.
         For example to getResponseBody:

         {'base64Encoded': False, 'body': 'response body string'}

     """
     return self.execute("executeCdpCommand", {'cmd': cmd, 'params': cmd_args})['value']

After doing some research and testing I found that this function could be used to achieve your use case.

Click to copy

import base64
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from pdfminer.pdfpage import PDFPage
from pdfminer.pdfpage import PDFParser
from pdfminer.pdfpage import PDFDocument

chrome_options = Options()
chrome_options.add_argument("--disable-infobars")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-popup-blocking")
chrome_options.add_argument('--headless')

browser = webdriver.Chrome('/usr/local/bin/chromedriver', options=chrome_options)
browser.get('http://www.google.com')

# use can defined additional parameters if needed
params = {'landscape': False,
          'paperWidth': 8.27,
          'paperHeight': 11.69}

# call the function "execute_cdp_cmd" with the command "Page.printToPDF" with
# parameters defined above
data = browser.execute_cdp_cmd("Page.printToPDF", params)

# save the output to a file.
with open('file_name.pdf', 'wb') as file:
    file.write(base64.b64decode(data['data']))

browser.quit()

# verify the page size of the PDF file created
parser = PDFParser(open('file_name.pdf', 'rb'))
doc = PDFDocument(parser)
pageSizesList = []
for page in PDFPage.create_pages(doc):
    print(page.mediabox)
    # output 
    [0, 0, 594.95996, 840.95996]

The output is in points, which need to be converted to inches.

594.95996 points equals 8.263332777783 inches
840.95996 points equals 11.6799994445 inches

8.263332777783 x 11.6799994445 is the A4 paper size.

ORIGINAL POST 07-13-2021

There are multiple parameters that you can pass when calling the function Page.printToPDF. Two of those parameters are:

paper_width
paper_height

The following code passes these parameters to Page.printToPDF.

Click to copy

import json
import base64
from selenium import webdriver
from selenium.webdriver.chrome.options import Options


def send_devtools(driver, command, params=None):
    if params is None:
        params = {}
    resource = "/session/%s/chromium/send_command_and_get_result" % driver.session_id
    url = driver.command_executor._url + resource
    body = json.dumps({"cmd": command, "params": params})
    resp = driver.command_executor._request("POST", url, body)
    return resp.get("value")


def create_pdf(driver, file_name):
    command = "Page.printToPDF"
    params = {'paper_width': '8.27', 'paper_height': '11.69'}
    result = send_devtools(driver, command,  params)
    save_pdf(result, file_name)
    return


def save_pdf(data, file_name):
    with open(file_name, 'wb') as file:
        file.write(base64.b64decode(data['data']))
    print('PDF created')


chrome_options = Options()
chrome_options.add_argument("--disable-infobars")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-popup-blocking")
chrome_options.add_argument('--headless')

browser = webdriver.Chrome('/usr/local/bin/chromedriver', options=chrome_options)
browser.get('http://www.google.com')

create_pdf(browser, 'test_pdf_1.pdf')

Click to copy

----------------------------------------
My system information
----------------------------------------
Platform:       maxOS
OS Version:     10.15.7
Python Version: 3.9
Selenium:       3.141.0
pdfminer.sixth: 20201018
----------------------------------------

149

answered Oct 18 '22 19:10

Life is complex

Related questions
                            
                                Trying to understand __init__.py combined with getattr
                            
                                Implementing inplace operations for methods in a class
                            
                                How can I list the extra features of a Python package
                            
                                multiprocessing in python - what gets inherited by forkserver process from parent process?
                            
                                Get hour of year from a Datetime
                            
                                How to terminate loop.run_in_executor with ProcessPoolExecutor gracefully?
                            
                                How to group a dataframe by 4 time periods and key
                            
                                Filtering non-'cohorts' from dataset
                            
                                How to solve "RuntimeError: CUDA error: invalid device ordinal"?
                            
                                Linear Programing- Max value optimization
                            
                                How to do arithmetic right shift in python for signed and unsigned values
                            
                                PySpark "illegal reflective access operation" when executed in terminal
                            
                                YellowBrick ImportError: cannot import name 'safe_indexing' from 'sklearn.utils'
                            
                                Pandas Lookup to be deprecated - elegant and efficient alternative
                            
                                Calculate how much of a trajectory/path falls in-between two other trajectories
                            
                                Django rest framework - how can i limit requests to an API endpoint?
                            
                                Segmentation fault when importing a C++ shared object in Python
                            
                                Converting Bad Text to Korean
                            
                                Why does interning global string values result in less memory used per multiprocessing process?
                            
                                Writing Pandas DataFrame to Excel: How to auto-adjust column widths

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Selenium print PDF in A4 format

Tags:

python

google-chrome

selenium

selenium-chromedriver

Rodrigo

People also ask

1 Answers

UPDATED POST 07-17-2021

ORIGINAL POST 07-13-2021

Life is complex

Recent Activity

Donate For Us