I have some doubts regarding sitemap.xml generation and Django's sitemap framework particularly.
Let's say I have a blog application which has post_detail pages with each post's content and a bunch of 'helper' pages like 'view by tag', 'view by author', etc.
I could probably have a custom django.conf.urls.defaults.url function to register url-mapping for the sitemap... What do you think?
Thank you.
How a sitemap is used is dictated by the search engine. Some will only index what you have in the sitemap, while others will use it as a starting point and crawl the entire site based on cross-linking.
As for including non-generated pages, we just created a subclass of django.contrib.sitemaps.Sitemap and have it read a plain-text file with one URL per line. Something like:
class StaticSitemap(Sitemap):
priority = 0.8
lastmod = datetime.datetime.now()
def __init__(self, filename):
self._urls = []
try:
f = open(filename, 'rb')
except:
return
tmp = []
for x in f:
x = re.sub(r"\s*#.*$", '', x) # strip comments
if re.match('^\s*$', x):
continue # ignore blank lines
x = string.strip(x) # clean leading/trailing whitespace
x = re.sub(' ', '%20', x) # convert spaces
if not x.startswith('/'):
x = '/' + x
tmp.append(x)
f.close()
self._urls = tmp
# __init__
def items(self):
return self._urls
def location(self, obj):
return obj
You can invoke it with something like this in your main sitemap routine:
sitemap['static'] = StaticSitemap(settings.DIR_ROOT +'/sitemap.txt')
And our sitemap.txt file looks something like this:
# One URL per line.
# All paths start from root - i.e., with a leading /
# Blank lines are OK.
/tour/
/podcast_archive/
/related_sites/
/survey/
/youtube_videos/
/teachers/
/workshops/
/workshop_listing_info/
/aboutus/
/history/
/investment/
/business/
/contact/
/privacy_policy/
/graphic_specs/
/help_desk/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With