Selenium Chrome save as pdf change download folder

Tags:

I want to download a website as pdf file, it's working fine, but it should download the file to a specific path, instead it's just downloading the file to my default download directory.

import json
from selenium import webdriver

appState = {
    "recentDestinations": [
        {
            "id": "Save as PDF",
            "origin": "local"
        }
    ],
    "selectedDestinationId": "Save as PDF",
    "version": 2,
    'download.default_directory': 'C:\\Users\\Oli\\Google Drive',
    "download.directory_upgrade": True
}

profile = {'printing.print_preview_sticky_settings.appState': json.dumps(appState)}

chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('prefs', profile)
chrome_options.add_argument('--kiosk-printing')

driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get('https://www.google.com/')
driver.execute_script('window.print();')

By the way anyone got an idea to safe the file with a specific name?

944

asked Feb 07 '19 17:02

Oliver Weidner

3 Answers

The download.default_directory setting is only for downloaded content. Chrome treats files saved on the page differently. To change the default folder for a printout of the page, simply set the savefile.default_directory value instead.

So the full example to print to pdf for a custom location:

import json
from selenium import webdriver

appState = {
    "recentDestinations": [
        {
            "id": "Save as PDF",
            "origin": "local",
            "account": ""
        }
    ],
    "selectedDestinationId": "Save as PDF",
    "version": 2
}

profile = {'printing.print_preview_sticky_settings.appState': json.dumps(appState),
           'savefile.default_directory': 'path/to/dir/'}

chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('prefs', profile)
chrome_options.add_argument('--kiosk-printing')

driver = webdriver.Chrome(options=chrome_options)
driver.get(url)
driver.execute_script('window.print();')

191

answered Nov 08 '22 19:11

kayoz

One more workaround. Just save the file as is and then move and rename it as needed.

Idea of the code below: check creation time of every (pdf) file in download directory, and compare with the time now. If the time delta less than some value (let's say 15 seconds), presumably this is the right file, move/rename the file where you need.

import os
import time
import json
from selenium import webdriver

appState = {
    "recentDestinations": [
        {
            "id": "Save as PDF",
            "origin": "local"
        }
    ],
    "selectedDestinationId": "Save as PDF",
    "version": 2
}

profile = {'printing.print_preview_sticky_settings.appState': json.dumps(appState)}

download_path = r'C:\Users\Oli\Downloads' # Path where browser save files
new_path = r'C:\Users\Oli\Google Drive' # Path where to move file

chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('prefs', profile)
chrome_options.add_argument('--kiosk-printing')
driver = webdriver.Chrome(chrome_options=chrome_options)

driver.get('http://example.com/')
driver.execute_script('window.print();')

new_filename = 'new_name.pdf' # Set the name of file
timestamp_now = time.time() # time now
# Now go through the files in download directory
for (dirpath, dirnames, filenames) in os.walk(download_path):
    for filename in filenames:
        if filename.lower().endswith(('.pdf')):
            full_path = os.path.join(download_path, filename)
            timestamp_file = os.path.getmtime(full_path) # time of file creation
            # if time delta is less than 15 seconds move this file
            if (timestamp_now - timestamp_file) < 15: 
                full_new_path = os.path.join(new_path, new_filename)
                os.rename(full_path, full_new_path)
                print(full_path+' is moved to '+full_new_path)

Note: it's just an example. You need to think about all you actions. To make the code stable you might need to add some exceptions handling. Better to move this additional code to a function. And so on.

answered Nov 08 '22 20:11

Litvin

The key is to use:

pdf = webdriver.execute_cdp_cmd(
        "Page.printToPDF", {
        "printBackground": True,

    })

Then you can write the pdf to wherever you want. Here is a full example:

import base64
from typing import Optional
from pathlib import Path
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

def svg_to_pdf_chromium(filename: Path,  out_dir: Optional[Path] = None):
    """Convert a svg on disk to a pdf using Selenium and Chromedriver"""

    if out_dir is None:
        out_dir = filename.parents[0]

    service = Service(ChromeDriverManager().install())

    chrome_options.add_argument('--kiosk-printing')
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--disable-gpu')
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--window-size=2000x2000")
    chrome_options.add_argument('--disable-dev-shm-usage')

    webdriver_chrome = webdriver.Chrome(
        service=service, options=chrome_options)

    webdriver_chrome.get(f'file://{filename}')
    pdf = webdriver_chrome.execute_cdp_cmd(
        "Page.printToPDF", {
            "printBackground": True,
            "landscape": True,
            "displayHeaderFooter": False,
            "scale": 0.75,
            })
    webdriver_chrome.close()
    with open(out_dir / f'{filename.stem}.pdf', "wb") as f:
        f.write(base64.b64decode(pdf['data']))
        
svg_to_pdf_chromium(OUTPUT / "svg" / "mysvg.svg")

This also allows to remove the ugly wait time.

Options available with Page.printToPDF are listed in the Chrome DevTools docs.

answered Nov 08 '22 19:11

Alex

Related questions
                            
                                Log Python Systemd output to log file
                            
                                How to return rows with Null values in pyspark dataframe?
                            
                                Subsetting pandas dataframe and retain original size
                            
                                How to check version 4 UUIDs in python? [closed]
                            
                                How to implement RBF activation function in Keras?
                            
                                Selenium Threads: how to run multi-threaded browser with proxy ( python)
                            
                                What is the recommended way to compute a weighted sum of selected columns of a pandas dataframe?
                            
                                How can I write a function fmap that returns the same type of iterable that was inputted?
                            
                                Django ImageField is not updating when update() method is used
                            
                                Regex to extract ONLY alphanumeric words
                            
                                How to copy only the changed file-contents on the already existed destination file?
                            
                                How to work around Out of bounds nanosecond [duplicate]
                            
                                Is it possible to expand the drawable area around the QSlider
                            
                                Error using HoughCircles with 3-channel input
                            
                                What is the difference between slicing in numpy arrays and slicing a list in Python?
                            
                                SQLAlchemy @property causes 'Unknown Field' error in Marshmallow with dump_only
                            
                                Convert a numpy array to iterator
                            
                                XOR-ing and Summing Two Black and White Images
                            
                                Type(1,) returns int expected tuple
                            
                                Keras: Difference between AveragePooling1D layer and GlobalAveragePooling1D layer

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Selenium Chrome save as pdf change download folder

Tags:

python

selenium

web-scraping