What is the best way to parse html in google apps script

Tags:

var page = UrlFetchApp.fetch(contestURL); var doc = XmlService.parse(page);

The above code gives a parse error when used, however if I replace the XmlService class with the deprecated Xml class, with the lenient flag set, it parses the html properly.

var page = UrlFetchApp.fetch(contestURL); var doc = Xml.parse(page, true);

The problem is mostly caused because of no CDATA in the javascript part of the html and the parser complains with the following error.

The entity name must immediately follow the '&' in the entity reference.

Even if I remove all the <script>(.*?)</script> using regex, it still complains because the <br> tags aren't closed. Is there a clean way of parsing html into a DOM tree.

805

asked Oct 18 '13 17:10

copperhead

2 Answers

I ran into this exact same problem. I was able to circumvent it by first using the deprecated Xml.parse, since it still works, then selecting the body XmlElement, then passing in its Xml String into the new XmlService.parse method:

var page = UrlFetchApp.fetch(contestURL); var doc = Xml.parse(page, true); var bodyHtml = doc.html.body.toXmlString(); doc = XmlService.parse(bodyHtml); var root = doc.getRootElement();

Note: This solution may not work if the old Xml.parse is completely removed from Google Scripts.

112

answered Sep 22 '22 21:09

Justin Bicknell

In 2021, the best way to parse HTML on the .gs side that I know of is...

Click + next to Library
Enter 1ReeQ6WO8kKNxoaA_O0XEQ589cIrRvEBA9qcWpNqdOP17i47u6N9M5Xh0
Click "Look up"
Click Add
Sample usage:

const contentText = UrlFetchApp.fetch('https://www.somesite.com/').getContentText(); const $ = Cheerio.load(contentText);  $('.some-class').first().text();

That's it -- this is probably the closest we'll get to doing jQuery-like DOM selection in GAS. The .first() is important or else you may extract more content than you expected (think of it as using querySelector() instead of querySelectorAll()).

Credit where credit is due: https://github.com/tani/cheeriogs

answered Sep 21 '22 21:09

thdoan

Related questions
                            
                                Repeat an array with multiple elements multiple times in JavaScript
                            
                                jQuery contains() with a variable syntax
                            
                                Check opacity by jQuery
                            
                                Fetch a collection using a POST request?
                            
                                regex string replace
                            
                                How to disable cross-device action mirroring functionality of BrowserSync? (GhostMode)
                            
                                Difference between val.length and val().length in jQuery?
                            
                                javascript get function body
                            
                                Arrays - Find missing numbers in a Sequence
                            
                                Auto-expanding textarea
                            
                                Find out the 'line' (row) number of the cursor in a textarea
                            
                                How much data can a browser save in localStorage
                            
                                Javascript window.open not working
                            
                                Disable button after click in JQuery
                            
                                round to 3 decimal points in javascript/jquery
                            
                                Flutter Webview two way communication with Javascript
                            
                                The react-scripts package provided by Create React App requires a dependency:
                            
                                In Node.js / Express, how do I "download" a page and gets its HTML?
                            
                                Concatenating strings with `if` statements in JavaScript
                            
                                How to call external JavaScript function in HTML

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the best way to parse html in google apps script

Tags:

javascript

html

regex

google-apps-script

copperhead

People also ask

2 Answers

Justin Bicknell

thdoan

Recent Activity

Donate For Us