Can I use non latin characters in my robots.txt file and sitemap.xml like this?
robots.txt
User-agent: *
Disallow: /somefolder/
Sitemap: http://www.domainwithåäö.com/sitemap.xml
sitemap.xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>http://www.domainwithåäö.com/</loc></url>
<url><loc>http://www.domainwithåäö.com/subpage1</loc></url>
<url><loc>http://www.domainwithåäö.com/subpage2</loc></url>
</urlset>
Or should I do like this?
robots.txt
User-agent: *
Disallow: /somefolder/
Sitemap: http://www.xn--domainwith-z5al6t.com/sitemap.xml
sitemap.xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>http://www.xn--domainwith-z5al6t.com/</loc></url>
<url><loc>http://www.xn--domainwith-z5al6t.com/subpage1</loc></url>
<url><loc>http://www.xn--domainwith-z5al6t.com/subpage2</loc></url>
</urlset>
On https://support.google.com/webmasters/answer/183668 Google writes: "Make sure that your URLs follow the RFC-3986 standard for URIs, the RFC-3987 standard for IRIs", so I guess the correct answer is that you have to follow these two standards.
My best guess is that it doesn't matter, because Google consider the two URLs identical. That might also be what's stated in the standards, but I'm not good at reading these, so I can't confirm nor deny that.
Using the the xn--
format works. I haven't tried using Unicode characters to see if that also works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With