Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I use non latin characters in my robots.txt and sitemap.xml?

Can I use non latin characters in my robots.txt file and sitemap.xml like this?

robots.txt

User-agent: *
Disallow: /somefolder/

Sitemap: http://www.domainwithåäö.com/sitemap.xml

sitemap.xml

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>http://www.domainwithåäö.com/</loc></url>
<url><loc>http://www.domainwithåäö.com/subpage1</loc></url>
<url><loc>http://www.domainwithåäö.com/subpage2</loc></url>
</urlset>

Or should I do like this?

robots.txt

User-agent: *
Disallow: /somefolder/

Sitemap: http://www.xn--domainwith-z5al6t.com/sitemap.xml

sitemap.xml

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>http://www.xn--domainwith-z5al6t.com/</loc></url>
<url><loc>http://www.xn--domainwith-z5al6t.com/subpage1</loc></url>
<url><loc>http://www.xn--domainwith-z5al6t.com/subpage2</loc></url>
</urlset>
like image 287
user1087110 Avatar asked Nov 10 '22 08:11

user1087110


1 Answers

On https://support.google.com/webmasters/answer/183668 Google writes: "Make sure that your URLs follow the RFC-3986 standard for URIs, the RFC-3987 standard for IRIs", so I guess the correct answer is that you have to follow these two standards.

My best guess is that it doesn't matter, because Google consider the two URLs identical. That might also be what's stated in the standards, but I'm not good at reading these, so I can't confirm nor deny that.

Using the the xn-- format works. I haven't tried using Unicode characters to see if that also works.

like image 179
Jan Aagaard Avatar answered Jan 04 '23 02:01

Jan Aagaard