Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP: HTML Attribute Encoding / JavaScript Decoding

What's the proper way to encode untrusted data for HTML attribute context? For example:

<input type="hidden" value="<?php echo $data; ?>" />

I usually use htmlentities() or htmlspecialchars() to do this:

<input type="hidden" value="<?php echo htmlentities($data); ?>" />

However, I recently ran into an issue where this was breaking my application when the data I needed to pass was a URL which needed to be handed off to JavaScript to change the page location:

<input id="foo" type="hidden" value="foo?bar=1&amp;baz=2" />
<script>
    // ...
    window.location = document.getElementById('foo').value;
    // ...
</script>

In this case, foo is a C program, and it doesn't understand the encoded characters in the URL and segfaults.

I can simply grab the value in JavaScript and do something like value.replace('&amp;', '&'), but that seems kludgy, and only works for ampersands.

So, my question is: is there a better way to go about the encoding or decoding of data that gets injected into HTML attributes?

I have read all of OWASP's XSS Prevention Cheatsheet, and it sounds to me like as long as I'm careful to quote my attributes, then the only character I need to encode is the quote itself (") - in which case, I could use something like str_replace('"', '&quot;', ...) - but, I'm not sure if I'm understanding it properly.

like image 270
FtDRbwLXw6 Avatar asked May 01 '12 20:05

FtDRbwLXw6


People also ask

What's the difference between HTML entities () and htmlspecialchars ()?

?> Difference between htmlentities() and htmlspecialchars() function: The only difference between these function is that htmlspecialchars() function convert the special characters to HTML entities whereas htmlentities() function convert all applicable characters to HTML entities.

What is HTML encoding and decoding?

HTML encoding converts characters that are not allowed in HTML into character-entity equivalents; HTML decoding reverses the encoding. For example, when embedded in a block of text, the characters < and > are encoded as &lt; and &gt; for HTTP transmission.

What is HTML entities PHP?

PHP htmlentities() function is string function, which is used to convert character to HTML entities.


1 Answers

Your current method of using htmlentities() or htmlspecialchars() is the right approach.

The example you provided is correct HTML:

<input id="foo" type="hidden" value="foo?bar=1&amp;baz=2" />

The ampersand in the value attribute does indeed need to be HTML encoded, otherwise your HTML is invalid. Most browsers would parse it correctly with an & in there, but that doesn't change the fact that it's invalid and you are correct to be encoding it.

Your problem lies not in the encoding of the value, which is good, but in the fact that you're using Javascript code that doesn't decode it properly.

In fact, I'm surprised at this, because your JS code is accessing the DOM, and the DOM should be returning the decoded values.

I wrote a JSfiddle to prove this to myself: http://jsfiddle.net/qRd4Z/

Running this, it gives me an alert box with the decoded value as I expected. Changing it to console.log also give the result I expect. So I'm not sure why you're getting different results? Perhaps you're using a different browser? It might be worth specifying which one you're testing with. Or perhaps you've double-encoded the entities by mistake? Can you confirm that's not the case?

like image 156
Spudley Avatar answered Oct 20 '22 06:10

Spudley