Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Missing elements when using selenium chrome driver to automatically 'Save as PDF'

I am trying to automatically save a PDF file created with pdftohtmlEX (https://github.com/coolwanglu/pdf2htmlEX) using the selenium (chrome) webdriver.

It almost works except captions of figures and sometimes even part of the figures are missing.

Manually saved:

Manually saved

Automatically saved using selenium & chrome webdriver: Saved using selenium & chromedriver

Here is my code (you need the chromium webdriver (http://chromedriver.chromium.org/downloads) in the same folder as this script):

import json
from selenium import webdriver

# print settings: save as pdf, 'letter' formatting
appState = """{
    "recentDestinations": [
        {
            "id": "Save as PDF",
            "origin": "local"
        }
    ],
    "mediaSize": {
        "height_microns": 279400,
        "name": "NA_LETTER",
        "width_microns": 215900,
        "custom_display_name": "Letter"
    },
    "selectedDestinationId": "Save as PDF",
    "version": 2
}"""

appState = json.loads(appState)
profile = {"printing.print_preview_sticky_settings.appState": json.dumps(appState)}
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('prefs', profile)
# Enable automatically pressing the print button in print preview
# https://peter.sh/experiments/chromium-command-line-switches/
chrome_options.add_argument('--kiosk-printing')

driver = webdriver.Chrome('./chromedriver', options=chrome_options)
driver.get('http://www.deeplearningbook.org/contents/intro.html')
driver.execute_script('window.print();')
driver.quit()

Sometimes when I manually print this happens, too. But if I then change any of the printing options, the preview reloads and the image captions are there again and stay there no matter what options I further enable/disable.

Chrome printing settings

What I tried so far:

  • different Chrome webdriver versions (71, 72, 73) from this site: http://chromedriver.chromium.org/downloads
  • enable background graphics by adding '"isCssBackgroundEnabled": true' to the appState
like image 754
Max S. Avatar asked Mar 01 '19 11:03

Max S.


1 Answers

So, through fiddeling around, I came by the solution by accident. I don't really understand why, but enabling the 'PrintBrowser mode' ("Enables PrintBrowser mode, in which everything renders as though printed.") solves the issue. This may or may have to do with CSS loading properly.

I just need to add chrome_options.add_argument('--enable-print-browser') and all elements are there!

like image 153
Max S. Avatar answered Sep 20 '22 10:09

Max S.