We have a string which is readed from web page. Because browsers are tolerant to unencoded special chars (e.g. ampersand), some pages using it encoded, some not... so there is a large possibility, we have stored some data encoded once, and some multiple times...
Is there some clear solution, how to be sure, my string is decoded enough no matter how many times it was encoded?
Here is what we using now:
public static string HtmlDecode(this string input)
{
var temp = HttpUtility.HtmlDecode(input);
while (temp != input)
{
input = temp;
temp = HttpUtility.HtmlDecode(input);
}
return input;
}
and same using with UrlDecode.
By using double encoding it's possible to bypass security filters that only decode user input once. The second decoding process is executed by the backend platform or modules that properly handle encoded data, but don't have the corresponding security checks in place.
URL encoding converts characters into a format that can be transmitted over the Internet. - w3Schools. So, "/" is actually a seperator, but "%2f" becomes an ordinary character that simply represents "/" character in element of your url.
HTMLEncoding turns this character into "<" which is the encoded representation of the less-than sign. URLEncoding does the same, but for URLs, for which the special characters are different, although there is some overlap. Save this answer. Show activity on this post.
Simple & Easy answer, The %2C means , comma in URL.
That's probably the best approach honestly. The real solution would be to rework your code so that you only singly encode things in all places, so that you could only singly decode them.
Your code seems to be decoding html strings correctly, with multiple checks.
However, if the input HTML is malformed, i.e not encoded properly, the decoding will be unexpected. i.e bad inputs might not be decoded properly no matter how many times it passes through this method.
A quick check with two encoded strings, one with completely encoded string, and another with partially encoded yielded the following results.
"<b>"
will decode to "<b>"
"<b>
will decode to "<b>"
In case this is helpful to anyone, here is a recursive version for multiple HTML encoded strings (I find it a bit easier to read):
public static string HtmlDecode(string input) {
string decodedInput = WebUtility.HtmlDecode(input);
if (input == decodedInput) {
return input;
}
return HtmlDecode(decodedInput);
}
WebUtility
is in the System.Net
namespace.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With