Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert HTML to plain text in JS without browser environment

I have a CouchDB view map function that generates an abstract of a stored HTML document (first x characters of text). Unfortunately I have no browser environment to convert HTML to plain text.

Currently I use this multi-stage regexp

html.replace(/<style([\s\S]*?)<\/style>/gi, ' ')     .replace(/<script([\s\S]*?)<\/script>/gi, ' ')     .replace(/(<(?:.|\n)*?>)/gm, ' ')     .replace(/\s+/gm, ' '); 

while it's a very good filter, it's obviously not a perfect one and some leftovers slip through sometimes. Is there a better way to convert to plain text without a browser environment?

like image 905
Era Avatar asked Mar 02 '13 22:03

Era


People also ask

How do I convert HTML content to plain text?

This is the most efficient way of doing the task. Create a dummy element and assign it to a variable. We can extract later using the element objects. Assign the HTML text to innerHTML of the dummy element and we will get the plain text from the text element objects.

How do I convert HTML text to normal text in Java?

Just call the method html2text with passing the html text and it will return plain text.

How do I display HTML as plain text?

You can show HTML tags as plain text in HTML on a website or webpage by replacing < with &lt; or &60; and > with &gt; or &62; on each HTML tag that you want to be visible. Ordinarily, HTML tags are not visible to the reader on the browser.

Can you convert HTML to JavaScript?

Insert your HTML text into the text box by typing it or cut and paste. Then to convert it to JavaScript that is usable in an HTML document, click the 'Convert HTML -> JavaScript' button; the converted code will appear in the same box. The 'Clear Text' button will erase everything in the text box.


1 Answers

This simple regular expression works:

text.replace(/<[^>]*>/g, ''); 

It removes all anchors.

Entities, like &lt; does not contains <, so there is no issue with this regex.

like image 108
Gaël Barbin Avatar answered Sep 28 '22 04:09

Gaël Barbin