I need to get the actual html code of an element in a web page. For example if the actual html code inside the element is <code>"How to&nbsp;fix"</code> Running this JavaScript: <pre class="prettyprint lang-js prettyprint-override"><code>getElementById('myE').innerHTML </code></pre> Gives me <code>"How to fix"</code> which is the parsed HTML. How can I get the unparsed <code>"How to&nbsp;fix"</code> using JavaScript?

You cannot get the actual HTML source of part of your web page. When you give a web browser an HTML page, it parses the HTML into some DOM nodes that are the definitive version of your document as far as the browser is concerned. The DOM keeps the significant information from the HTML—like that you used the Unicode character U+00A0 Non-Breaking Space before the word <code>fix</code>—but not the irrelevent information that you used it by means of an entity reference rather than just typing it raw (<code> </code>). When you ask the browser for an element node's <code>innerHTML</code>, it doesn't give you the original HTML source that was parsed to produce that node, because it no longer has that information. Instead, it generates new HTML from the data stored in the DOM. The browser decides on how to format that HTML serialisation; different browsers produce different HTML, and chances are it won't be the same way you formatted it originally. In particular, <ul> <li>element names may be upper- or lower-cased;</li> <li>attributes may not be in the same order as you stated them in the HTML;</li> <li>attribute quoting may not be the same as in your source. IE often generates unquoted attributes that aren't even valid HTML; all you can be sure of is that the <code>innerHTML</code> generated will be safe to use in the same browser by writing it to another element's <code>innerHTML</code>;</li> <li>it may not use entity references for anything but characters that would otherwise be impossible to include directly in text content: ampersands, less-thans and attribute-value-quotes. Instead of returning <code>&nbsp;</code> it may simply give you the raw <code> </code> character.</li> </ul> You may not be able to see that that's a non-breaking space, but it still is one and if you insert that HTML into another element it will act as one. You shouldn't need to rely anywhere on a non-breaking space character being entity-escaped to <code>&nbsp;</code>... if you do, for some reason, you can get that by doing: <pre class="prettyprint"><code>x= el.innerHTML.replace(/\xA0/g, '&nbsp;') </code></pre> but that's only escaping U+00A0 and not any of the other thousands of possible Unicode characters, so it's a bit questionable. If you really really need to get your page's actual source HTML, you can make an <code>XMLHttpRequest</code> to your own URL (<code>location.href</code>) and get the full, unparsed HTML source in the <code>responseText</code>. There is almost never a good reason to do this.

What you have should work: Element test: <pre class="prettyprint"><code><div id="myE">How to&nbsp;fix</div> </code></pre> JavaScript test: <pre class="prettyprint"><code>alert(document.getElementById("myE").innerHTML); //alerts "How to&nbsp;fix" </code></pre> You can try it out here. Make sure that wherever you're using the result isn't show <code>&nbsp;</code> as a space, which is likely the case. If you want to show it somewhere that's designed for HTML, you'll need to escape it.

Getting unparsed (raw) HTML with JavaScript

Tags:

javascript

html

I need to get the actual html code of an element in a web page.

For example if the actual html code inside the element is "How to fix"

Running this JavaScript:

getElementById('myE').innerHTML

Gives me "How to fix" which is the parsed HTML.

How can I get the unparsed "How to fix" using JavaScript?

904

asked Oct 11 '10 10:10

Melina

2 Answers

You cannot get the actual HTML source of part of your web page.

When you give a web browser an HTML page, it parses the HTML into some DOM nodes that are the definitive version of your document as far as the browser is concerned. The DOM keeps the significant information from the HTML—like that you used the Unicode character U+00A0 Non-Breaking Space before the word fix—but not the irrelevent information that you used it by means of an entity reference rather than just typing it raw ( ).

When you ask the browser for an element node's innerHTML, it doesn't give you the original HTML source that was parsed to produce that node, because it no longer has that information. Instead, it generates new HTML from the data stored in the DOM. The browser decides on how to format that HTML serialisation; different browsers produce different HTML, and chances are it won't be the same way you formatted it originally.

In particular,

element names may be upper- or lower-cased;
attributes may not be in the same order as you stated them in the HTML;
attribute quoting may not be the same as in your source. IE often generates unquoted attributes that aren't even valid HTML; all you can be sure of is that the innerHTML generated will be safe to use in the same browser by writing it to another element's innerHTML;
it may not use entity references for anything but characters that would otherwise be impossible to include directly in text content: ampersands, less-thans and attribute-value-quotes. Instead of returning   it may simply give you the raw character.

You may not be able to see that that's a non-breaking space, but it still is one and if you insert that HTML into another element it will act as one. You shouldn't need to rely anywhere on a non-breaking space character being entity-escaped to  ... if you do, for some reason, you can get that by doing:

x= el.innerHTML.replace(/\xA0/g, '&nbsp;')

but that's only escaping U+00A0 and not any of the other thousands of possible Unicode characters, so it's a bit questionable.

If you really really need to get your page's actual source HTML, you can make an XMLHttpRequest to your own URL (location.href) and get the full, unparsed HTML source in the responseText. There is almost never a good reason to do this.

answered Oct 18 '22 17:10

bobince

What you have should work:

Element test:

<div id="myE">How to&nbsp;fix</div>

JavaScript test:

alert(document.getElementById("myE").innerHTML); //alerts "How to&nbsp;fix"

You can try it out here. Make sure that wherever you're using the result isn't show   as a space, which is likely the case. If you want to show it somewhere that's designed for HTML, you'll need to escape it.

answered Oct 18 '22 19:10

Nick Craver

Related questions
                            
                                Is it possible to set environment variables for exactly one test?
                            
                                Middle button click event
                            
                                Rotate an image 180 degrees on click with jquery with animation
                            
                                Exporting a video in p5.js
                            
                                angular 4.0.0 novalidate attribute
                            
                                How to disable require-jsdoc eslint in webpack
                            
                                How to retrieve data with AsyncStorage multiGet in React Native
                            
                                Reset owl carousel autoplayTimeout after user action
                            
                                observe localstorage changes in js
                            
                                Mapping array of object values to Interface type in Typescript
                            
                                How can I execute Javascript before a JSF <h:commandLink> action is performed? [duplicate]
                            
                                How can I open a link in the default web browser from an HTA?
                            
                                Javascript mechanism to autoscroll to the bottom of a growing page?
                            
                                Ways to increase performance when set big value to innerHTML
                            
                                At witt's end... Javascript won't replace '\n'!
                            
                                Preserving SCRIPT tags (and more) in CKEditor
                            
                                Javascript: Get mouse position relative to parent element
                            
                                Decode some injected Javascript?
                            
                                Way to detect broken images in javascript? [duplicate]
                            
                                What is the expected order of an array submitted in an HTML form?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With