Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sitemap for a site with a large number of dynamic subdomains

I'm running a site which allows users to create subdomains. I'd like to submit these user subdomains to search engines via sitemaps. However, according to the sitemaps protocol (and Google Webmaster Tools), a single sitemap can include URLs from a single host only.

What is the best approach?

At the moment I've the following structure:

  1. Sitemap index located at example.com/sitemap-index.xml that lists sitemaps for each subdomain (but located at the same host).
  2. Each subdomain has its own sitemap located at example.com/sitemap-subdomain.xml (this way the sitemap index includes URLs from a single host only).
  3. A sitemap for a subdomain contains URLs from the subdomain only, i.e., subdomain.example.com/*
  4. Each subdomain has subdomain.example.com/robots.txt file:

--

User-agent: *
Allow: /

Sitemap: http://example.com/sitemap-subdomain.xml

--

I think this approach complies to the sitemaps protocol, however, Google Webmaster Tools give errors for subdomain sitemaps: "URL not allowed. This url is not allowed for a Sitemap at this location."

I've also checked how other sites do it. Eventbrite, for instance, produces sitemaps that contain URLs from multiple subdomains (e.g., see http://www.eventbrite.com/events01.xml.gz). This, however, does not comply with the sitemaps protocol.

What approach do you recommend for sitemaps?

like image 958
bartekb Avatar asked Oct 07 '10 10:10

bartekb


2 Answers

I recently struggled through this and finally got it working. See this thread for more details:

http://www.google.com/support/forum/p/Webmasters/thread?tid=53c3e4b3ab8d9503&hl=en&fid=53c3e4b3ab8d9503000497bd04ba63cf

Summary:

  • Use DNS verification to verify your site and all it's subdomains in one fell swoop
  • make the robots.txt on all your subdomains point to the main sitemap on your www domain
  • You may need to wait several days for Google to update it's cached copies of robot.txt on all your subdomains. It will still show errors until then.
like image 80
Brian Armstrong Avatar answered Oct 21 '22 07:10

Brian Armstrong


Yes, the subdomain restriction is in the sitemaps.org spec, but, Google has put some exceptions in place:

  1. Verify all subdomains within your Google Webmaster tools account http://www.google.com/support/webmasters/bin/answer.py?answer=75712 cross-submission of sitemaps XML via Google Webmaster tools - if submitted via the root of your domain - will not throw errors for Google

  2. Within the robots.txt of a subdomain you can point to sitemaps XML on other domains. there will be no cross submission errors - for Google

like image 38
Franz Enzenhofer Avatar answered Oct 21 '22 08:10

Franz Enzenhofer