There is a web page that contains many rows of data that are continually updated.
There is a fixed number of rows, so old rows are cycled out and not stored anywhere.
This page is broken up by a "load more" button that will appear until all of the stored rows are displayed on the page.
I need to write a script in Puppeteer / Node.js that clicks that button until it no longer exists on the page...
THEN
...read all the text on the page. (I have this part of the script finished.)
I am new to Puppeteer and not sure how to set this up. Any help would be greatly appreciated.
EDIT:
I added this block:
const cssSelector = await page.evaluate(() => document.cssSelector('.u-field-button Button-button-18U-i'));
// Click the "load more" button repeatedly until it no longer appears
const isElementVisible = async (page, cssSelector) => {
await page.waitForSelector(cssSelector, { visible: true, timeout: 2000 })
.catch(() => {
return false;
});
return true;
};
let loadMoreVisible = await isElementVisible(page, cssSelector);
while (loadMoreVisible) {
await page.click(cssSelector);
loadMoreVisible = await isElementVisible(page, cssSelector);
}
But I am getting this error:
Error: Evaluation failed: TypeError: document.cssSelector is not a function
at __puppeteer_evaluation_script__:1:17
at ExecutionContext.evaluateHandle (/Users/reallymemorable/node_modules/puppeteer/lib/ExecutionContext.js:124:13)
at process.internalTickCallback (internal/process/next_tick.js:77:7)
-- ASYNC --
at ExecutionContext.<anonymous> (/Users/reallymemorable/node_modules/puppeteer/lib/helper.js:144:27)
at ExecutionContext.evaluate (/Users/reallymemorable/node_modules/puppeteer/lib/ExecutionContext.js:58:31)
at ExecutionContext.<anonymous> (/Users/reallymemorable/node_modules/puppeteer/lib/helper.js:145:23)
at Frame.evaluate (/Users/reallymemorable/node_modules/puppeteer/lib/FrameManager.js:439:20)
at process.internalTickCallback (internal/process/next_tick.js:77:7)
-- ASYNC --
at Frame.<anonymous> (/Users/reallymemorable/node_modules/puppeteer/lib/helper.js:144:27)
at Page.evaluate (/Users/reallymemorable/node_modules/puppeteer/lib/Page.js:736:43)
at Page.<anonymous> (/Users/reallymemorable/node_modules/puppeteer/lib/helper.js:145:23)
at /Users/reallymemorable/Documents/scripts.scrapers/squarespace.ip.scraper/squarespace5.js:32:34
at process.internalTickCallback (internal/process/next_tick.js:77:7)
(node:8009) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:8009) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
OK this is what I'd recommend you do in order to achieve this. I'm going to ignore that there are always a fixed number of rows for your data (maybe this will change in future) and instead will set you up for if there are an unknown number of rows of data to be displayed by continually clicking on the "load more" button.
So the first thing you want to do is set up a method which decides if the "load more" button is displayed in the UI. You want to do that by writing a method as follows:
const isElementVisible = async (page, cssSelector) => {
let visible = true;
await page
.waitForSelector(cssSelector, { visible: true, timeout: 2000 })
.catch(() => {
visible = false;
});
return visible;
};
Once you pass in your required css selector (in this case the selector for your "load more" button) this method will return true
if the button is displayed and false
if it is not.
You want the timeout to be 2000
because you want to continually check that this button is displayed. If it's not displayed, the timeout would otherwise default to 30000
and that's far too long to have your code hanging around waiting. So I find that 2000
is a nice compromise. The purpose of the catch
block is to catch the error that will be thrown when the element is no longer displayed - you want to ignore the fact that the error is thrown since you are trying to get to the point where the button is no longer displayed. You know that it won't be displayed after X amount of clicks. That's fine. So you need to catch
the error to cleanly bypass when that happens.
Next step, then, is to do something like this in order to let your code continue clicking on the "load more" button until it is no longer clickable (ie. displayed):
let loadMoreVisible = await isElementVisible(page, selectorForLoadMoreButton);
while (loadMoreVisible) {
await page
.click(selectorForLoadMoreButton)
.catch(() => {});
loadMoreVisible = await isElementVisible(page, selectorForLoadMoreButton);
}
This will continually check for if the button is visible in your UI, click it if it is displayed and then repeat the process until the button is no longer displayed. This ensures that all rows of data will be displayed in the UI before you continue on with the remainder of your test script.
You will also need a catch
block on the click
action as shown above. The reason for this is that headless
mode moves very quickly. Sometimes too quickly for the UI to keep up with it. Usually, on the very last display of the "Show More" button, the isElementVisible
method will execute before the UI has updated to eliminate the presence of the button, thus it returns true
when, in fact, the selector is now no longer displayed. This, then, triggers an exception from the click
request since the element is no longer there. For me, the cleanest way to work around this is to add that empty catch
block on the click
instruction so that, if this happens, the click
action will still bypass cleanly without failing your entire test.
Update 1:
You're just using the css selector incorrectly. Your selector should be:
const cssSelector = '.u-field-button Button-button-18U-i'; // This is your CSS selector for the element
You don't need to use the evaluate
method for that.
Update 2:
OK I've added some improvements, I've extensively tested this code on a few different sites and found that my own logic wasn't quite right for a "one size fits all" approach to clicking on these sort of buttons so this is probably why you're getting those exceptions. I've updated my original answer with all changes made.
Just a quick note: I've updated both the isElementVisible
method and the while
loop.
Hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With