Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using regex in text based value search in the page using puppeteer

I have to parse a web site that based on regex for $x using p

<a href="">(001)</a>
<a href="">(002)</a>
<a href="">(003)</a>
<a href="">(004)</a>
<a href="">Hello1</a>
<a href="">Hello2</a>
<a href="">WOrld</a>

I am using below code

const xpathTxtArr = await page.$x("//*/a[contains(text(), 'Hello')]"); to fetch all links with hello text.

Similarly I want to know , if I can pass regular expression like \d{3} in page.$x expression to get the link handlers with pattern (001)?

like image 747
made_in_india Avatar asked Sep 17 '25 06:09

made_in_india


2 Answers

Well I could not find an answer . To grep the text I used evaluate function to get all the inner text.

here is the sample code

const result = await page.evaluate(async () => {
    console.log('Browser scope.');
    let elementTxtArr = [];
    document.querySelectorAll("a").forEach((a)=> {
       console.log(a.innerText);
       elementTxtArr.push(a.innerText);
    });
    return elementTxtArr;
});
console.log(result);
like image 167
made_in_india Avatar answered Sep 19 '25 14:09

made_in_india


I think you may find help in this link:

https://drafts.csswg.org/selectors-4/#attribute-substrings

I haven't tried this yet, but maybe something like:

const result = await page.evaluate(async () => {
    console.log('Browser scope.');
    let elementTxtArr = [];
document.querySelectorAll('[href^="hello"]').forEach((a)=>{
    console.log(a.innerText);
    elementTxtArr.push(a.innerText);}); 
    return elementTxtArr;
});
console.log(result);
like image 36
Proximus Seraphim Dimitri Davi Avatar answered Sep 19 '25 15:09

Proximus Seraphim Dimitri Davi