Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JavaScript automatically converts some special characters

I need to extract a HTML-Substring with JS which is position dependent. I store special characters HTML-encoded.

For example:

HTML

<div id="test"><p>l&ouml;sen &amp; gr&uuml;&szlig;en</p></div>​

Text

lösen & grüßen

My problem lies in the JS-part, for example when I try to extract the fragment , which has the HTML-dependent starting position of 3 and the end position of 9 inside the <div> block. JS seems to convert some special characters internally so that the count from 3 to 9 is wrongly interpreted as "lösen " and not "l&ouml;". Other special characters like the &amp; are not affected by this.

So my question is, if someone knows why JS is behaving in that way? Characters like &auml; or &ouml; are being converted while characters like &amp; or &nbsp; are plain. Is there any possibility to avoid this conversion?

I've set up a fiddle to demonstrate this: JSFiddle

Thanks for any help!

EDIT:

Maybe I've explained it a bit confusing, sorry for that. What I want is the HTML:

<p>l&ouml;sen &amp; gr&uuml;&szlig;en</p> .

Every special character should be unconverted, except the HTML-Tags. Like in the HTML above.

But JS converts the &ouml; or &uuml; into ö or ü automatically, what I need to avoid.

like image 205
noplacetoh1de Avatar asked Nov 22 '12 13:11

noplacetoh1de


People also ask

How do you handle special characters in JavaScript?

To use a special character as a regular one, prepend it with a backslash: \. . That's also called “escaping a character”. For example: alert( "Chapter 5.1".

What is an escape character JavaScript?

Escape Characters are the symbol used to begin an escape command in order to execute some operation. They are characters that can be interpreted in some alternate way than what we intended to. Javascript uses '\' (backslash) in front as an escape character.

What is used in JavaScript to insert special characters?

JavaScript allows us to add special characters to a text String using a backslash (\) sign. We can add different types of special characters, including the single quote, double quote, ampersand, new line, tab, backspace, form feed, etc., using the backslash just before the characters.


1 Answers

That's because the browser (and not JavaScript) turns entities that don't need to be escaped in HTML into their respective Unicode characters (e.g. it skips &amp;, &lt; and &gt;).

So by the time you inspect .innerHTML, it no longer contains exactly what was in the original page source; you could reverse this process, but it involves the full map of character <-> entity pairs which is just not practical.

like image 119
Ja͢ck Avatar answered Oct 26 '22 17:10

Ja͢ck