I want to download a website as pdf file, it's working fine, but it should download the file to a specific path, instead it's just downloading the file to my default download directory.
import json
from selenium import webdriver
appState = {
"recentDestinations": [
{
"id": "Save as PDF",
"origin": "local"
}
],
"selectedDestinationId": "Save as PDF",
"version": 2,
'download.default_directory': 'C:\\Users\\Oli\\Google Drive',
"download.directory_upgrade": True
}
profile = {'printing.print_preview_sticky_settings.appState': json.dumps(appState)}
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('prefs', profile)
chrome_options.add_argument('--kiosk-printing')
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get('https://www.google.com/')
driver.execute_script('window.print();')
By the way anyone got an idea to safe the file with a specific name?
To handle a PDF document in Selenium test automation, we can use a java library called PDFBox. Apache PDFBox is an open-source library that exclusively helps in handling the PDF documents. We can use it to verify the text present in the document, extract a specific section of text or image in the documents, and so on.
We can save a pdf file on Chrome using the Selenium webdriver. To download the pdf file in a specific location we have to take the help of the Options class. We shall create an object of this class and apply add_experimental_option on it.
We can download files to a specified location with Selenium in Python. This is done by the help of the ChromeOptions class. We shall set the preferences of the browser and pass the download.default_directory parameter.
options: Helps set the preferences to Chrome browser. download.default_directory : Used for changing the default download directory. Example: The code specifies C:\Tutorial\down, which means that the file will be downloaded to that location. add_experimental_option: Allows users to add these preferences to their Selenium webdriver object.
This is done by the help of the ChromeOptions class. We shall set the preferences of the browser and pass the download.default_directory parameter. We need to mention the path of the download directory with that parameter.
The download.default_directory
setting is only for downloaded content. Chrome treats files saved on the page differently. To change the default folder for a printout of the page, simply set the savefile.default_directory
value instead.
So the full example to print to pdf for a custom location:
import json
from selenium import webdriver
appState = {
"recentDestinations": [
{
"id": "Save as PDF",
"origin": "local",
"account": ""
}
],
"selectedDestinationId": "Save as PDF",
"version": 2
}
profile = {'printing.print_preview_sticky_settings.appState': json.dumps(appState),
'savefile.default_directory': 'path/to/dir/'}
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('prefs', profile)
chrome_options.add_argument('--kiosk-printing')
driver = webdriver.Chrome(options=chrome_options)
driver.get(url)
driver.execute_script('window.print();')
One more workaround. Just save the file as is and then move and rename it as needed.
Idea of the code below: check creation time of every (pdf) file in download directory, and compare with the time now. If the time delta less than some value (let's say 15 seconds), presumably this is the right file, move/rename the file where you need.
import os
import time
import json
from selenium import webdriver
appState = {
"recentDestinations": [
{
"id": "Save as PDF",
"origin": "local"
}
],
"selectedDestinationId": "Save as PDF",
"version": 2
}
profile = {'printing.print_preview_sticky_settings.appState': json.dumps(appState)}
download_path = r'C:\Users\Oli\Downloads' # Path where browser save files
new_path = r'C:\Users\Oli\Google Drive' # Path where to move file
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('prefs', profile)
chrome_options.add_argument('--kiosk-printing')
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get('http://example.com/')
driver.execute_script('window.print();')
new_filename = 'new_name.pdf' # Set the name of file
timestamp_now = time.time() # time now
# Now go through the files in download directory
for (dirpath, dirnames, filenames) in os.walk(download_path):
for filename in filenames:
if filename.lower().endswith(('.pdf')):
full_path = os.path.join(download_path, filename)
timestamp_file = os.path.getmtime(full_path) # time of file creation
# if time delta is less than 15 seconds move this file
if (timestamp_now - timestamp_file) < 15:
full_new_path = os.path.join(new_path, new_filename)
os.rename(full_path, full_new_path)
print(full_path+' is moved to '+full_new_path)
Note: it's just an example. You need to think about all you actions. To make the code stable you might need to add some exceptions handling. Better to move this additional code to a function. And so on.
The key is to use:
pdf = webdriver.execute_cdp_cmd(
"Page.printToPDF", {
"printBackground": True,
})
Then you can write the pdf to wherever you want. Here is a full example:
import base64
from typing import Optional
from pathlib import Path
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
def svg_to_pdf_chromium(filename: Path, out_dir: Optional[Path] = None):
"""Convert a svg on disk to a pdf using Selenium and Chromedriver"""
if out_dir is None:
out_dir = filename.parents[0]
service = Service(ChromeDriverManager().install())
chrome_options.add_argument('--kiosk-printing')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--window-size=2000x2000")
chrome_options.add_argument('--disable-dev-shm-usage')
webdriver_chrome = webdriver.Chrome(
service=service, options=chrome_options)
webdriver_chrome.get(f'file://{filename}')
pdf = webdriver_chrome.execute_cdp_cmd(
"Page.printToPDF", {
"printBackground": True,
"landscape": True,
"displayHeaderFooter": False,
"scale": 0.75,
})
webdriver_chrome.close()
with open(out_dir / f'{filename.stem}.pdf', "wb") as f:
f.write(base64.b64decode(pdf['data']))
svg_to_pdf_chromium(OUTPUT / "svg" / "mysvg.svg")
This also allows to remove the ugly wait time.
Options available with Page.printToPDF
are listed in the
Chrome DevTools docs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With