I am developing for javascript disabled phones. My code looks like this
<a href="someurl?var=a&lang=english">Link 1</a>
<a href="someurl?lang=english&var=a">Link 2</a>
But the browser interprets the URL as -
someurl?var=a%e2%8c%a9=english (Link 1, incorrect)
someurl?lang=english&var=a (Link 2 works just fine !)
It seems like &lang=english
is being converted to a%e2%8c%a9=english
Could someone explain why this is happening?
In HTML, the &
character represents the start of a character reference.
If you try to specify an invalid character reference, then browsers will perform error recovery and treat it as an ampersand instead.
From the HTML DTD:
<!ENTITY lang CDATA "〈" -- left-pointing angle bracket = bra,
U+2329 ISOtech -->
… so &lang
is not an invalid character reference.
To include an ampersand character as data, use the character reference for an ampersand: &
By HTML 4.01 rules, the &lang
entity reference denotes the character U+2329 LEFT-POINTING ANGLE BRACKET “〈”. In UTF-8 encoding, that character is represented as 0xE2 0x8C 0xA9, and therefore in a URL, it gets %-encoded as a%e2%8c%a9
.
Nowadays, most browsers don’t work that way. Specifically, in a URL, the reference &lang
is not recognized when followed by an equals sign =
(even though it is valid HTML 4.01 in that context).
To deal with browsers that may follow the old rules, as well as in order to comply with syntax rules independently of HTML version, escape each occurrence of the ampersand “&” as &
—it is safest to do this for all occurrences of “&” as a data character, in attribute values and elsewhere.
Depending on the server-side software that processes the URL when they have been followed, you might be able to use an unproblematic character like “;” instead of “&” as a separator.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With