One way to unescape HTML entities is to put our escaped text in a text area. This will unescape the text, so we can return the unescaped text afterward by getting the text from the text area. We have an htmlDecode function that takes an input string as a parameter.
HTML encoding converts characters that are not allowed in HTML into character-entity equivalents; HTML decoding reverses the encoding. For example, when embedded in a block of text, the characters < and > are encoded as < and > for HTTP transmission.
Wikipedia has a good expalanation of character encodings and how some characters should be represented in HTML. Load the HTML data to decode from a file, then press the 'Decode' button: Browse: Alternatively, type or paste in the text you want to HTML–decode, then press the 'Decode' button.
The un-escape tool removes all html entities from the text and returns clean text.
I recommend against using the jQuery code that was accepted as the answer. While it does not insert the string to decode into the page, it does cause things such as scripts and HTML elements to get created. This is way more code than we need. Instead, I suggest using a safer, more optimized function.
var decodeEntities = (function() {
// this prevents any overhead from creating the object each time
var element = document.createElement('div');
function decodeHTMLEntities (str) {
if(str && typeof str === 'string') {
// strip script/html tags
str = str.replace(/<script[^>]*>([\S\s]*?)<\/script>/gmi, '');
str = str.replace(/<\/?\w(?:[^"'>]|"[^"]*"|'[^']*')*>/gmi, '');
element.innerHTML = str;
str = element.textContent;
element.textContent = '';
}
return str;
}
return decodeHTMLEntities;
})();
http://jsfiddle.net/LYteC/4/
To use this function, just call decodeEntities("&")
and it will use the same underlying techniques as the jQuery version will—but without jQuery's overhead, and after sanitizing the HTML tags in the input. See Mike Samuel's comment on the accepted answer for how to filter out HTML tags.
This function can be easily used as a jQuery plugin by adding the following line in your project.
jQuery.decodeEntities = decodeEntities;
You could try something like:
var Title = $('<textarea />').html("Chris' corner").text();
console.log(Title);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
JS Fiddle.
A more interactive version:
$('form').submit(function() {
var theString = $('#string').val();
var varTitle = $('<textarea />').html(theString).text();
$('#output').text(varTitle);
return false;
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<form action="#" method="post">
<fieldset>
<label for="string">Enter a html-encoded string to decode</label>
<input type="text" name="string" id="string" />
</fieldset>
<fieldset>
<input type="submit" value="decode" />
</fieldset>
</form>
<div id="output"></div>
JS Fiddle.
Like Robert K said, don't use jQuery.html().text() to decode html entities as it's unsafe because user input should never have access to the DOM. Read about XSS for why this is unsafe.
Instead try the Underscore.js utility-belt library which comes with escape and unescape methods:
_.escape(string)
Escapes a string for insertion into HTML, replacing &
, <
, >
, "
, `
, and '
characters.
_.escape('Curly, Larry & Moe');
=> "Curly, Larry & Moe"
_.unescape(string)
The opposite of escape, replaces &
, <
, >
, "
, `
and '
with their unescaped counterparts.
_.unescape('Curly, Larry & Moe');
=> "Curly, Larry & Moe"
To support decoding more characters, just copy the Underscore unescape method and add more characters to the map.
Original author answer here.
This is my favourite way of decoding HTML characters. The advantage of using this code is that tags are also preserved.
function decodeHtml(html) {
var txt = document.createElement("textarea");
txt.innerHTML = html;
return txt.value;
}
Example: http://jsfiddle.net/k65s3/
Input:
Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>
Output:
Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>
Here's a quick method that doesn't require creating a div, and decodes the "most common" HTML escaped chars:
function decodeHTMLEntities(text) {
var entities = [
['amp', '&'],
['apos', '\''],
['#x27', '\''],
['#x2F', '/'],
['#39', '\''],
['#47', '/'],
['lt', '<'],
['gt', '>'],
['nbsp', ' '],
['quot', '"']
];
for (var i = 0, max = entities.length; i < max; ++i)
text = text.replace(new RegExp('&'+entities[i][0]+';', 'g'), entities[i][1]);
return text;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With