Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Web parser in Javascript like Beautiful Soup in Python

Python has a library called Beautiful Soup that you can use to parse an HTML tree without creating 'get' requests in external web pages. I'm looking for the same in JavaScript, but I've only found jsdom and JSSoup (which seems unused) and if I'm correct, they only allow you to make requests.

I want a library in JavaScript which allows me to parse the entire HTML tree without getting CORS policy errors, that is, without making a request, just parsing it.

How can I do this?

like image 282
Omar Avatar asked Mar 14 '26 23:03

Omar


1 Answers

In a browser context, you can use DOMParser:

const html = "<h1>title</h1>";
const parser = new DOMParser();
const parsed = parser.parseFromString(html, "text/html");
console.log(parsed.firstChild.innerText); // "title"

and in node you can use node-html-parser:

import { parse } from 'node-html-parser';

const html = "<h1>title</h1>";
const parsed = parse(html);
console.log(parsed.firstChild.innerText); // "title"
like image 181
pjones123 Avatar answered Mar 17 '26 14:03

pjones123



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!