Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"&lang" misinterpreted in URL

I am developing for javascript disabled phones. My code looks like this

<a href="someurl?var=a&lang=english">Link 1</a>
<a href="someurl?lang=english&var=a">Link 2</a>

But the browser interprets the URL as -

someurl?var=a%e2%8c%a9=english         (Link 1, incorrect)
someurl?lang=english&var=a             (Link 2 works just fine !)

It seems like &lang=english is being converted to a%e2%8c%a9=english

Could someone explain why this is happening?

like image 929
Ankit Rustagi Avatar asked May 12 '14 10:05

Ankit Rustagi


2 Answers

In HTML, the & character represents the start of a character reference.

If you try to specify an invalid character reference, then browsers will perform error recovery and treat it as an ampersand instead.

From the HTML DTD:

<!ENTITY lang     CDATA "&#9001;" -- left-pointing angle bracket = bra,
                                 U+2329 ISOtech -->

… so &lang is not an invalid character reference.

To include an ampersand character as data, use the character reference for an ampersand: &amp;

like image 191
Quentin Avatar answered Sep 30 '22 10:09

Quentin


By HTML 4.01 rules, the &lang entity reference denotes the character U+2329 LEFT-POINTING ANGLE BRACKET “〈”. In UTF-8 encoding, that character is represented as 0xE2 0x8C 0xA9, and therefore in a URL, it gets %-encoded as a%e2%8c%a9.

Nowadays, most browsers don’t work that way. Specifically, in a URL, the reference &lang is not recognized when followed by an equals sign = (even though it is valid HTML 4.01 in that context).

To deal with browsers that may follow the old rules, as well as in order to comply with syntax rules independently of HTML version, escape each occurrence of the ampersand “&” as &amp;—it is safest to do this for all occurrences of “&” as a data character, in attribute values and elsewhere.

Depending on the server-side software that processes the URL when they have been followed, you might be able to use an unproblematic character like “;” instead of “&” as a separator.

like image 31
Jukka K. Korpela Avatar answered Sep 30 '22 12:09

Jukka K. Korpela