I have a QString with some HTML in it... is there an easy way to strip the HTML from it? I basically want just the actual text content.
<i>Test:</i><img src="blah.png" /><br> A test case
Would become:
Test: A test case
I'm curious to know if Qt has a string function or utility for this.
The strip_tags() function strips a string from HTML, XML, and PHP tags. Note: HTML comments are always stripped. This cannot be changed with the allow parameter. Note: This function is binary-safe.
To strip out all the HTML tags from a string there are lots of procedures in JavaScript. In order to strip out tags we can use replace() function and can also use . textContent property, . innerText property from HTML DOM.
The HTML tags can be removed from a given string by using replaceAll() method of String class.
stripHtml( html ) Changes the provided HTML string into a plain text string by converting <br> , <p> , and <div> to line breaks, stripping all other tags, and converting escaped characters into their display values.
QString s = "<i>Test:</i><img src=\"blah.png\" /><br> A test case"; s.remove(QRegExp("<[^>]*>")); // s == "Test: A test case"
If you don't care about performance that much then QTextDocument
does a pretty good job of converting HTML to plain text.
QTextDocument doc; doc.setHtml( htmlString ); return doc.toPlainText();
I know this question is old, but I was looking for a quick and dirty way to handle incorrect HTML. The XML parser wasn't giving good results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With