I am working on creating PDF from web page.
The application on which I am working is single page application.
I tried many options and suggestion on https://github.com/GoogleChrome/puppeteer/issues/1412
But it is not working
const browser = await puppeteer.launch({ executablePath: 'C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe', ignoreHTTPSErrors: true, headless: true, devtools: false, args: ['--no-sandbox', '--disable-setuid-sandbox'] }); const page = await browser.newPage(); await page.goto(fullUrl, { waitUntil: 'networkidle2' }); await page.type('#username', 'scott'); await page.type('#password', 'tiger'); await page.click('#Login_Button'); await page.waitFor(2000); await page.pdf({ path: outputFileName, displayHeaderFooter: true, headerTemplate: '', footerTemplate: '', printBackground: true, format: 'A4' });
What I want is to generate PDF report as soon as Page is loaded completely.
I don't want to write any type of delays i.e. await page.waitFor(2000);
I can not do waitForSelector because the page has charts and graphs which are rendered after calculations.
Help will be appreciated.
goto with the url and { waitUntil: "domcontentloaded" } to navigate to the url and wait until the whole page is loaded by setting waitUntil to 'domcontentloaded' .
The Puppeteer page class extends Node. js's native EventEmitter , which means that whenever you call page. on() , you are setting up an event listener using Node.
By default, it is 30000 milliseconds — 30 seconds. If events are not resolved within this time, page.goto () will throw an error. With options, Puppeteer waits for the network idle.
page.waitForSelector () and page.waitForXPath () is used to wait for an element. By default, the timeout is 30 sec in puppeteer. But we can change it according to the requirement. page.setDefaultTimeout (ms) is used to change the default timeout. page.setDefaultNavigationTimeout (ms) is used to change the navigation timeout.
Explicit wait or static wait is used to wait for a fixed or static time frame. for example, I set explicit wait like 50 sec. it means it will wait 50 sec before executing the next line of code. page.waitFor (time in ms) function is used to perform explicit wait in puppeteer.
If you have been using Puppeteer on your own you will know it comes with lots of intricacies, and pain points. One of those is ensuring all of your content is fully loaded before outputting your result being a PDF or an Image. What is the best way to ensure your content is all there?
You can use page.waitForNavigation()
to wait for the new page to load completely before generating a PDF:
await page.goto(fullUrl, { waitUntil: 'networkidle0', }); await page.type('#username', 'scott'); await page.type('#password', 'tiger'); await page.click('#Login_Button'); await page.waitForNavigation({ waitUntil: 'networkidle0', }); await page.pdf({ path: outputFileName, displayHeaderFooter: true, headerTemplate: '', footerTemplate: '', printBackground: true, format: 'A4', });
If there is a certain element that is generated dynamically that you would like included in your PDF, consider using page.waitForSelector()
to ensure that the content is visible:
await page.waitForSelector('#example', { visible: true, });
Sometimes the networkidle
events do not always give an indication that the page has completely loaded. There could still be a few JS scripts
modifying the content on the page. So watching for the completion of HTML
source code modifications by the browser seems to be yielding better results. Here's a function you could use -
const waitTillHTMLRendered = async (page, timeout = 30000) => { const checkDurationMsecs = 1000; const maxChecks = timeout / checkDurationMsecs; let lastHTMLSize = 0; let checkCounts = 1; let countStableSizeIterations = 0; const minStableSizeIterations = 3; while(checkCounts++ <= maxChecks){ let html = await page.content(); let currentHTMLSize = html.length; let bodyHTMLSize = await page.evaluate(() => document.body.innerHTML.length); console.log('last: ', lastHTMLSize, ' <> curr: ', currentHTMLSize, " body html size: ", bodyHTMLSize); if(lastHTMLSize != 0 && currentHTMLSize == lastHTMLSize) countStableSizeIterations++; else countStableSizeIterations = 0; //reset the counter if(countStableSizeIterations >= minStableSizeIterations) { console.log("Page rendered fully.."); break; } lastHTMLSize = currentHTMLSize; await page.waitFor(checkDurationMsecs); } };
You could use this after the page load
/ click
function call and before you process the page content. e.g.
await page.goto(url, {'timeout': 10000, 'waitUntil':'load'}); await waitTillHTMLRendered(page) const data = await page.content()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With