Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting raw text content of HTML element with HTML uninterpreted

I have Googled my brains out and can't figure out how to make this work. Here is what I'm trying to do:

HTML:

<div id=derp>&quot;Hi, my name is..&quot;</div>

Javascript:

var div = document.getElementById('derp');
alert(div.innerHTML);
alert(div.innerText);
alert(div.textContent);

All of those alerts interpret and return the &quot; as " in the resulting string. I want to get the raw text with &quot; uninterpreted.

They all return:

"Hi, my name is.."

When I want to get:

&quot;Hi, my name is..&quot;

Is there a way to do this? Preferably without trying to use a regex to replace every instance of " with &quot;.

It's kind of a long story of what I'm trying to do, but simply using replace() to search and replace every instance of " would be a headache to implement because of other regex matching/parsing that needs to occur.

Thanks in advance for any Javascript wizards who can save my sanity!

like image 822
Trey Avatar asked Mar 14 '13 20:03

Trey


People also ask

Can we read HTML elements using DOM?

From the DOM, users can access HTML elements in five different ways in JavaScript. At below, users can see the demonstration of the above methods with the sample code.

What is the method to get content inside HTML tags?

The getElementsByTagName() method returns a collection of all elements with a specified tag name. The getElementsByTagName() method returns an HTMLCollection.


2 Answers

To quote bobince

When you ask the browser for an element node's innerHTML, it doesn't give you the original HTML source that was parsed to produce that node, because it no longer has that information. Instead, it generates new HTML from the data stored in the DOM. The browser decides on how to format that HTML serialisation; different browsers produce different HTML, and chances are it won't be the same way you formatted it originally.

In summary: innerHTML/innerText/text/textContent/nodeValue/indexOf, none of them will give you the unparsed text.

The only possible way to do this is with regex, or you can do an ajax post to the page itself, but that is a bad practice.

like image 147
gkiely Avatar answered Sep 18 '22 23:09

gkiely


I prepared some days ago a bin with some different approaches: http://jsbin.com/urazer/4/edit

My favorite:

var text = "<a href='#' title=\"Foo\"></a>");
var html = text.replace(/[<&>'"]/g, function(c) {
  return "&#" + c.charCodeAt() + ";";
});
like image 31
yckart Avatar answered Sep 19 '22 23:09

yckart