Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is "&reg" being rendered as "®" without the bounding semicolon

I've been running into a problem that was revealed through our Google adwords-driven marketing campaign. One of the standard parameters used is "region". When a user searches and clicks on a sponsored link, Google generates a long URL to track the click and sends a bunch of stuff along in the referrer. We capture this for our records, and we've noticed that the "Region" parameter is coming through incorrectly. What should be

http://ravercats.com/meow?foo=bar&region=catnip 

is instead coming through as:

http://ravercats.com/meow?foo=bar®ion=catnip 

I've verified that this occurs in all browsers. It's my understanding that HTML entity syntax is defined as follows:

&VALUE; 

where the leading boundary is the ampersand and the closing boundary is the semicolon. Seems straightforward enough. The problem is that this isn't being respected for the ® entity, and it's wreaking all kinds of havoc throughout our system.

Does anyone know why this is occurring? Is it a bug in the DTD? (I'm looking for the current HTML DTD to see if I can make sense of it) I'm trying to figure out what would be common across browsers to make this happen, thus my looking for the DTD.

Here is a proof you can use. Take this code, make an HTML file out of it and render it in a browser:

<html> <a href="http://foo.com/bar?foo=bar&region=US&register=lowpass&reg_test=fail&trademark=correct">http://foo.com/bar?foo=bar&region=US&register=lowpass&reg_test=fail&trademark=correct</a> </html> 

EDIT: To everyone who's suggesting that I need to escape the entire URL, the example URLs above are exactly that, examples. The real URL is coming directly from Google and I have no control over how it is constructed. These suggestions, while valid, don't answer the question: "Why is this happening".

like image 980
Spanky Avatar asked Mar 20 '13 18:03

Spanky


People also ask

Why is the sky blue short answer?

The Short Answer: Sunlight reaches Earth's atmosphere and is scattered in all directions by all the gases and particles in the air. Blue light is scattered more than the other colors because it travels as shorter, smaller waves. This is why we see a blue sky most of the time.

How is the sky blue?

Violet and blue light have the shortest wavelengths and red light has the longest. Therefore, blue light is scattered more than red light and the sky appears blue during the day. When the Sun is low in the sky during sunrise and sunset, the light has to travel further through the Earth's atmosphere.

Why is the sky blue but space is black?

Since there is virtually nothing in space to scatter or re-radiate the light to our eye, we see no part of the light and the sky appears to be black.


1 Answers

Although valid character references always have a semicolon at the end, some invalid named character references without a semicolon are, for backward compatibility reasons, recognized by modern browsers' HTML parsers.

Either you know what that entire list is, or you follow the HTML5 rules for when & is valid without being escaped (e.g. when followed by a space) or otherwise always escape & as &amp; whenever in doubt.

For reference, the full list of named character references that are recognized without a semicolon is:

AElig, AMP, Aacute, Acirc, Agrave, Aring, Atilde, Auml, COPY, Ccedil, ETH, Eacute, Ecirc, Egrave, Euml, GT, Iacute, Icirc, Igrave, Iuml, LT, Ntilde, Oacute, Ocirc, Ograve, Oslash, Otilde, Ouml, QUOT, REG, THORN, Uacute, Ucirc, Ugrave, Uuml, Yacute, aacute, acirc, acute, aelig, agrave, amp, aring, atilde, auml, brvbar, ccedil, cedil, cent, copy, curren, deg, divide, eacute, ecirc, egrave, eth, euml, frac12, frac14, frac34, gt, iacute, icirc, iexcl, igrave, iquest, iuml, laquo, lt, macr, micro, middot, nbsp, not, ntilde, oacute, ocirc, ograve, ordf, ordm, oslash, otilde, ouml, para, plusmn, pound, quot, raquo, reg, sect, shy, sup1, sup2, sup3, szlig, thorn, times, uacute, ucirc, ugrave, uml, uuml, yacute, yen, yuml

However, it should be noted that only when in an attribute value, named character references in the above list are not processed as such by conforming HTML5 parsers if the next character is a = or a alphanumeric ASCII character.

For the full list of named character references with or without ending semicolons, see here.

like image 64
Alohci Avatar answered Oct 23 '22 12:10

Alohci