I am trying put a URL into a data- attribute. In particular
<tr data-href="/page.cfm?Id=#EncodeForHTMLAttribute(ID)#">
...
OR perhaps it should be
<tr data-href="/page.cfm?Id=#EncodeForURL(ID)#">
...
Note that ID can have special characters
Edit:
Much later I am going to
$("tr").click(function() { window.location = $(this).data("href"); });
Let us analyze some scenarios:
<!--- our "tricky" ID --->
<cfset ID = '"><script>alert("my evil script");</script><div foo="'>
<!--- we are closing the data-href attribute, injecting our JS and start a new tag to complete the remaining tag --->
<cfoutput>
<div data-href="page.cfm?Id=#ID#"></div>
<!--- [data-href] is printed as: page.cfm?Id="><script>alert("my evil script");</script><div foo=" --->
</cfoutput>
An alert dialog apppears with "my evil script".
Never leave user input unencoded! (You already knew that.)
Note: You should always encode the full value of an HTML attribute, not just parts of it.
<!--- our "tricky" ID --->
<cfset ID = "&a=b?c">
<!--- we are having some reserved characters here that will confuse the browser's query string parser --->
<cfoutput>
<div data-href="#encodeForHtmlAttribute("page.cfm?Id=#ID#")#"></div>
<!--- [data-href] is printed as: page.cfm?Id=&a=b?c --->
<script>
var attr = document.getElementsByTagName('div')[0].getAttribute('data-href');
console.log(attr); <!--- page.cfm?Id=&a=b?c --->
<cfif structIsEmpty(URL)> <!--- test related: to prevent infinite redirection --->
location.href = attr;
</cfif>
</script>
</cfoutput>
<cfdump var="#URL#">
When requesting page.cfm, we will be redirected to page.cfm?Id=&a=b?c, the plain value of the data-href attribute. However, the scope dump of URL will present us the key-value-pairs:
Id: [empty string]
a: b?c
Which is to be expected, because the browser's query string parser could not distinguish between literal meaning and technical purpose of the characters. I recently answered this here.
Encoding the output isn't sufficient when having multiple contexts (here: HTML & URL/QueryString).
<!--- our "tricky" ID --->
<cfset ID = 'a&b="><script>alert("my evil script");</script><div foo="'>
<!--- we are mixing in both contexts now --->
<cfoutput>
<div data-href="page.cfm?Id=#encodeForUrl(ID)#"></div>
<!--- [data-href] is printed as: page.cfm?Id=a%26b%3D%22%3E%3Cscript%3Ealert%28%22my+evil+script%22%29%3B%3C%2Fscript%3E%3Cdiv+foo%3D%22 --->
<script>
var attr = document.getElementsByTagName('div')[0].getAttribute('data-href');
console.log(attr); <!--- page.cfm?Id=a%26b%3D%22%3E%3Cscript%3Ealert%28%22my+evil+script%22%29%3B%3C%2Fscript%3E%3Cdiv+foo%3D%22 --->
<cfif structIsEmpty(URL)> <!--- test related: to prevent infinite redirection --->
location.href = attr;
</cfif>
</script>
</cfoutput>
<cfdump var="#URL#">
When requesting page.cfm, we will be redirected to page.cfm?Id=a%26b%3D%22%3E%3Cscript%3Ealert%28%22my+evil+script%22%29%3B%3C%2Fscript%3E%3Cdiv+foo%3D%22, the plain value of the data-href attribute. The scope dump of URL will present us the key-value-pair:
Id: a&b="><script>alert("my evil script");</script><div foo="
This time, the browser's query string parser could distinguish between literal meaning and technical purpose of the characters. But what about the HTML context here? Well, the percent-encoding done by encodeForUrl() doesn't conflict with HTML's reserved characters, because % has no technical purpose in HTML and doesn't break anything.
Theoretically we are done here. There is no need to HTML-encode the URL-encoded value since there is no overlap of the two encodings.
<!--- our "tricky" ID --->
<cfset ID = 'a&b="><script>alert("my evil script");</script><div foo="'>
<!--- we are mixing in both contexts again --->
<cfoutput>
<div data-href="#encodeForHtmlAttribute("page.cfm?Id=#encodeForUrl(ID)#")#"></div>
<!--- [data-href] is printed as: page.cfm?Id=a%26b%3D%22%3E%3Cscript%3Ealert%28%22my+evil+script%22%29%3B%3C%2Fscript%3E%3Cdiv+foo%3D%22 --->
<script>
var attr = document.getElementsByTagName('div')[0].getAttribute('data-href');
console.log(attr); <!--- page.cfm?Id=a%26b%3D%22%3E%3Cscript%3Ealert%28%22my+evil+script%22%29%3B%3C%2Fscript%3E%3Cdiv+foo%3D%22 --->
<cfif structIsEmpty(URL)> <!--- test related: to prevent infinite redirection --->
location.href = attr;
</cfif>
</script>
</cfoutput>
<cfdump var="#URL#">
When requesting page.cfm, we will be redirected to page.cfm?Id=a%26b%3D%22%3E%3Cscript%3Ealert%28%22my+evil+script%22%29%3B%3C%2Fscript%3E%3Cdiv+foo%3D%22, the plain value of the data-href attribute. The scope dump of URL will present us the key-value-pair:
Id: a&b="><script>alert("my evil script");</script><div foo="
Nothing seems to have changed, right? Not exactly. This is what our data-href now looks like in the final HTML output:
page.cfm?Id=a%26b%3D%22%3E%3Cscript%3Ealert%28%22my+evil+script%22%29%3B%3C%2Fscript%3E%3Cdiv+foo%3D%22
As you can see, the percent-encoding is now additionally encoded for HTML (the % was encoded to its hex respresentation %).
The value is now safe for both contexts.
There are more encodings that could mix in (think of encodeForJavaScript()), but you get the idea. It's always about which characters in a value require encoding to not be misinterpreted for their technical purpose. This can end up as wild as having 3 to 4 nested encodings. But then again: Usually those encodings do not conflict with each other, so it is not necessarily required to have them encoded for all their contexts.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With