Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Playwright page.pdf() only gets one page

I have been trying to convert html to pdf. I have tried a lot of tools but none of them work. Now I am using playwright, it is converting the Page to PDF but it only gets the first screen view. From that page the content from right is trimmed.

import os
import time
import pathlib
from playwright.sync_api import sync_playwright

filePath = os.path.abspath("Lab6.html")
fileUrl = pathlib.Path(filePath).as_uri()
fileUrl = "file://C:/Users/PMYLS/Desktop/Code/ScribdPDF/Lab6.html"
with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto(fileUrl)
    for i in range(5): #(The scroll is not working)
        page.mouse.wheel(0, 15000)
        time.sleep(2)
    page.wait_for_load_state('networkidle') 
    page.emulate_media(media="screen")
    page.pdf(path="sales_report.pdf")
    browser.close()

Html View

Html view

PDF file after running script pdf view I have tried almost every tool available on the internet. I also used selenium but same results. I thought it was due to page not loaded properly, I added wait and manually scrolled the whole page to load the content. All giving same results.

The html I am converting https://drive.google.com/file/d/16jEq52iXtAMCg2FDt3VbQN0dCQmdTip_/view?usp=sharing

like image 887
farhan jatt Avatar asked Oct 25 '25 07:10

farhan jatt


1 Answers

Here's a somewhat dirty solution that worked on my end. The sleep and scroll isn't great and can probably be improved, but I'll leave this as a starter and see if I have time to tighten it up later (feel free to do the same).

from playwright.sync_api import sync_playwright # 1.37.0
from time import sleep


with open("index.html") as f:
    html = f.read()

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.set_content(html)

    # focus inside the annoying border to enable scroll
    page.click(".document_container")

    for i in range(10):
        page.mouse.wheel(0, 2500)
        sleep(0.5)

    # strip out the annoying border that messes up PDF generation
    page.evaluate("""() => {
        const el = document.querySelector(".document_scroller");
        el.parentElement.appendChild(el.querySelector(".document_container"));
        el.remove();
    }""")
    page.emulate_media(media="screen")
    page.pdf(path="sales_report.pdf")
    browser.close()

Two tricks:

  1. Clicking inside the border area enables scrolling, which appears necessary to get everything to load.
  2. Ripping out the annoying border allows the PDF generation to capture all pages. When the border is present, there's no scroll on the main body, only on the interior container, which the PDF capture doesn't seem to understand.
like image 89
ggorlen Avatar answered Oct 26 '25 23:10

ggorlen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!