Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Canonical tags and UTF8

Would the following 2 canonical link tags be viewed by spiders as pointing to the same URL?

<link rel="canonical" href="http://www.example.com/&#375;" /> - encoded
<link rel="canonical" href="http://www.example.com/ŷ" /> - unencoded

like image 969
Sam Avatar asked Nov 24 '10 11:11

Sam


People also ask

Can URLs have utf8 characters?

Building a valid URL By the same token, any code that generates or accepts UTF-8 input might treat URLs with UTF-8 characters as "valid", but would also need to translate those characters before sending them out to a web server. This process is called URL-encoding or percent-encoding.

Should every page have a canonical tag?

All pages (including the canonical page) should contain a canonical tag to prevent any possible duplication. Even if there are no other versions of a page, then that page should still include a canonical tag that links to itself.

Where do you put canonical tags?

The canonical tag is a page-level meta tag that is placed in the HTML header of a webpage. It tells the search engines which URL is the canonical version of the page being displayed.

Are canonical tags case sensitive?

Adding a link rel=”canonical” element also helps to confirm that and encourages search engines to focus on that version. So, in short, upper or lower case does matter for URLs.


1 Answers

&#375; is an HTML entity that represents the Unicode character with code point 375 in decimal notation. In hexadecimal it'd be 0x177 so we are talking about U+0177 which is ŷ.

  • http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
  • http://inamidst.com/stuff/unidata/
  • http://www.fileformat.info/info/unicode/char/0177/index.htm

That means that both URLs are exactly the same if:

  1. They're displayed in the context of an HTML document.
  2. The document declares a proper character set that supports such symbol and the editor you used to type it inserted the right code.

If the browser displays ŷ in both cases it's likely that character set is correct but you should make sure it is.

like image 68
Álvaro González Avatar answered Sep 22 '22 02:09

Álvaro González