I have to parse a web site that based on regex for $x
using p
<a href="">(001)</a>
<a href="">(002)</a>
<a href="">(003)</a>
<a href="">(004)</a>
<a href="">Hello1</a>
<a href="">Hello2</a>
<a href="">WOrld</a>
I am using below code
const xpathTxtArr = await page.$x("//*/a[contains(text(), 'Hello')]");
to fetch all links with hello
text.
Similarly I want to know , if I can pass regular expression like \d{3}
in page.$x
expression to get the link handlers with pattern (001)
?
Well I could not find an answer . To grep the text I used evaluate
function to get all the inner text.
here is the sample code
const result = await page.evaluate(async () => {
console.log('Browser scope.');
let elementTxtArr = [];
document.querySelectorAll("a").forEach((a)=> {
console.log(a.innerText);
elementTxtArr.push(a.innerText);
});
return elementTxtArr;
});
console.log(result);
I think you may find help in this link:
https://drafts.csswg.org/selectors-4/#attribute-substrings
I haven't tried this yet, but maybe something like:
const result = await page.evaluate(async () => {
console.log('Browser scope.');
let elementTxtArr = [];
document.querySelectorAll('[href^="hello"]').forEach((a)=>{
console.log(a.innerText);
elementTxtArr.push(a.innerText);});
return elementTxtArr;
});
console.log(result);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With