Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Puppeteer / Node.js to click a button as long as it exists -- and when it no longer exists, commence action

There is a web page that contains many rows of data that are continually updated.

There is a fixed number of rows, so old rows are cycled out and not stored anywhere.

This page is broken up by a "load more" button that will appear until all of the stored rows are displayed on the page.

I need to write a script in Puppeteer / Node.js that clicks that button until it no longer exists on the page...

THEN

...read all the text on the page. (I have this part of the script finished.)

I am new to Puppeteer and not sure how to set this up. Any help would be greatly appreciated.

EDIT:

I added this block:

  const cssSelector = await page.evaluate(() => document.cssSelector('.u-field-button Button-button-18U-i'));

  // Click the "load more" button repeatedly until it no longer appears
  const isElementVisible = async (page, cssSelector) => {
    await page.waitForSelector(cssSelector, { visible: true, timeout: 2000 })
    .catch(() => {
      return false;
    });
    return true;
  };

  let loadMoreVisible = await isElementVisible(page, cssSelector);
  while (loadMoreVisible) {
    await page.click(cssSelector);
    loadMoreVisible = await isElementVisible(page, cssSelector);
  }

But I am getting this error:

Error: Evaluation failed: TypeError: document.cssSelector is not a function
    at __puppeteer_evaluation_script__:1:17
    at ExecutionContext.evaluateHandle (/Users/reallymemorable/node_modules/puppeteer/lib/ExecutionContext.js:124:13)
    at process.internalTickCallback (internal/process/next_tick.js:77:7)
  -- ASYNC --
    at ExecutionContext.<anonymous> (/Users/reallymemorable/node_modules/puppeteer/lib/helper.js:144:27)
    at ExecutionContext.evaluate (/Users/reallymemorable/node_modules/puppeteer/lib/ExecutionContext.js:58:31)
    at ExecutionContext.<anonymous> (/Users/reallymemorable/node_modules/puppeteer/lib/helper.js:145:23)
    at Frame.evaluate (/Users/reallymemorable/node_modules/puppeteer/lib/FrameManager.js:439:20)
    at process.internalTickCallback (internal/process/next_tick.js:77:7)
  -- ASYNC --
    at Frame.<anonymous> (/Users/reallymemorable/node_modules/puppeteer/lib/helper.js:144:27)
    at Page.evaluate (/Users/reallymemorable/node_modules/puppeteer/lib/Page.js:736:43)
    at Page.<anonymous> (/Users/reallymemorable/node_modules/puppeteer/lib/helper.js:145:23)
    at /Users/reallymemorable/Documents/scripts.scrapers/squarespace.ip.scraper/squarespace5.js:32:34
    at process.internalTickCallback (internal/process/next_tick.js:77:7)
(node:8009) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:8009) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
like image 575
reallymemorable Avatar asked Dec 07 '22 13:12

reallymemorable


1 Answers

OK this is what I'd recommend you do in order to achieve this. I'm going to ignore that there are always a fixed number of rows for your data (maybe this will change in future) and instead will set you up for if there are an unknown number of rows of data to be displayed by continually clicking on the "load more" button.

So the first thing you want to do is set up a method which decides if the "load more" button is displayed in the UI. You want to do that by writing a method as follows:

const isElementVisible = async (page, cssSelector) => {
  let visible = true;
  await page
    .waitForSelector(cssSelector, { visible: true, timeout: 2000 })
    .catch(() => {
      visible = false;
    });
  return visible;
};

Once you pass in your required css selector (in this case the selector for your "load more" button) this method will return true if the button is displayed and false if it is not.

You want the timeout to be 2000 because you want to continually check that this button is displayed. If it's not displayed, the timeout would otherwise default to 30000 and that's far too long to have your code hanging around waiting. So I find that 2000 is a nice compromise. The purpose of the catch block is to catch the error that will be thrown when the element is no longer displayed - you want to ignore the fact that the error is thrown since you are trying to get to the point where the button is no longer displayed. You know that it won't be displayed after X amount of clicks. That's fine. So you need to catch the error to cleanly bypass when that happens.

Next step, then, is to do something like this in order to let your code continue clicking on the "load more" button until it is no longer clickable (ie. displayed):

let loadMoreVisible = await isElementVisible(page, selectorForLoadMoreButton);
while (loadMoreVisible) {
  await page
    .click(selectorForLoadMoreButton)
    .catch(() => {});
  loadMoreVisible = await isElementVisible(page, selectorForLoadMoreButton);
}

This will continually check for if the button is visible in your UI, click it if it is displayed and then repeat the process until the button is no longer displayed. This ensures that all rows of data will be displayed in the UI before you continue on with the remainder of your test script.

You will also need a catch block on the click action as shown above. The reason for this is that headless mode moves very quickly. Sometimes too quickly for the UI to keep up with it. Usually, on the very last display of the "Show More" button, the isElementVisible method will execute before the UI has updated to eliminate the presence of the button, thus it returns true when, in fact, the selector is now no longer displayed. This, then, triggers an exception from the click request since the element is no longer there. For me, the cleanest way to work around this is to add that empty catch block on the click instruction so that, if this happens, the click action will still bypass cleanly without failing your entire test.

Update 1:

You're just using the css selector incorrectly. Your selector should be:

const cssSelector = '.u-field-button Button-button-18U-i'; // This is your CSS selector for the element

You don't need to use the evaluate method for that.

Update 2:

OK I've added some improvements, I've extensively tested this code on a few different sites and found that my own logic wasn't quite right for a "one size fits all" approach to clicking on these sort of buttons so this is probably why you're getting those exceptions. I've updated my original answer with all changes made.

Just a quick note: I've updated both the isElementVisible method and the while loop.

Hope this helps!

like image 78
AJC24 Avatar answered Dec 10 '22 03:12

AJC24