HTML/Url decode on multiple times encoded string

Q: What is the difference between Htmlencode and Urlencode?

HTMLEncoding turns this character into "&lt;" which is the encoded representation of the less-than sign. URLEncoding does the same, but for URLs, for which the special characters are different, although there is some overlap. Save this answer. Show activity on this post.

Q: What is %2C HTML?

Simple & Easy answer, The %2C means , comma in URL.

We have a string which is readed from web page. Because browsers are tolerant to unencoded special chars (e.g. ampersand), some pages using it encoded, some not... so there is a large possibility, we have stored some data encoded once, and some multiple times...

Is there some clear solution, how to be sure, my string is decoded enough no matter how many times it was encoded?

Here is what we using now:

public static string HtmlDecode(this string input)
{
     var temp = HttpUtility.HtmlDecode(input);
     while (temp != input)
     {
         input = temp;
         temp = HttpUtility.HtmlDecode(input);
     }
     return input;
}

and same using with UrlDecode.

What happens if you double encode a URL?

By using double encoding it's possible to bypass security filters that only decode user input once. The second decoding process is executed by the backend platform or modules that properly handle encoded data, but don't have the corresponding security checks in place.

What is %2f in URL?

URL encoding converts characters into a format that can be transmitted over the Internet. - w3Schools. So, "/" is actually a seperator, but "%2f" becomes an ordinary character that simply represents "/" character in element of your url.

What is the difference between Htmlencode and Urlencode?

HTMLEncoding turns this character into "<" which is the encoded representation of the less-than sign. URLEncoding does the same, but for URLs, for which the special characters are different, although there is some overlap. Save this answer. Show activity on this post.

What is %2C HTML?

Simple & Easy answer, The %2C means , comma in URL.

That's probably the best approach honestly. The real solution would be to rework your code so that you only singly encode things in all places, so that you could only singly decode them.

Your code seems to be decoding html strings correctly, with multiple checks.

However, if the input HTML is malformed, i.e not encoded properly, the decoding will be unexpected. i.e bad inputs might not be decoded properly no matter how many times it passes through this method.

A quick check with two encoded strings, one with completely encoded string, and another with partially encoded yielded the following results.

"<b>" will decode to "<b>"

"<b&gt will decode to "<b&gt"

In case this is helpful to anyone, here is a recursive version for multiple HTML encoded strings (I find it a bit easier to read):

public static string HtmlDecode(string input) {
    string decodedInput = WebUtility.HtmlDecode(input);

    if (input == decodedInput) {
        return input;
    }

    return HtmlDecode(decodedInput);
}

WebUtility is in the System.Net namespace.

HTML/Url decode on multiple times encoded string

Tags:

c#

sasjaq

People also ask

3 Answers

Haney

LakshmiNarayanan

Dimitar Dimitrov

Recent Activity

Donate For Us

HTML/Url decode on multiple times encoded string

Tags:

c#

sasjaq

People also ask

3 Answers

Haney

LakshmiNarayanan

Dimitar Dimitrov

Related questions

Recent Activity

Donate For Us