Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

static pages in Django sitemap framework

I have some doubts regarding sitemap.xml generation and Django's sitemap framework particularly.

Let's say I have a blog application which has post_detail pages with each post's content and a bunch of 'helper' pages like 'view by tag', 'view by author', etc.

  1. Is it mandatory to include each and every page in sitemap.xml, including 'helper' pages? I want all of 'helper' pages indexed as there are many keywords and text. I know that sitemaps are designed to help index pages, to give some directions to web-crawler, but not to limit crawling. What is the best practice for that? Include everything or include only important pages?
  2. If it's okay to have all of the pages in sitemap.xml, what is the best way to submit plain, not-stored in db pages to sitemaps framework? One possible way is to have a sitemap class which returns reversed urls by url name. But it doesn't seem to be DRY at all, because I'll gonna need to register those url-names for the second time (in url() function and in Sitemap class).

I could probably have a custom django.conf.urls.defaults.url function to register url-mapping for the sitemap... What do you think?

Thank you.

like image 460
vshulyak Avatar asked Jan 24 '26 03:01

vshulyak


1 Answers

How a sitemap is used is dictated by the search engine. Some will only index what you have in the sitemap, while others will use it as a starting point and crawl the entire site based on cross-linking.

As for including non-generated pages, we just created a subclass of django.contrib.sitemaps.Sitemap and have it read a plain-text file with one URL per line. Something like:

class StaticSitemap(Sitemap):
    priority = 0.8
    lastmod = datetime.datetime.now()

    def __init__(self, filename):
        self._urls = []
        try:
            f = open(filename, 'rb')
        except:
            return

        tmp = []
        for x in f:
            x = re.sub(r"\s*#.*$", '', x) # strip comments
            if re.match('^\s*$', x):
                continue # ignore blank lines
            x = string.strip(x) # clean leading/trailing whitespace
            x = re.sub(' ', '%20', x) # convert spaces
            if not x.startswith('/'):
                x = '/' + x
            tmp.append(x)
        f.close()
        self._urls = tmp
    # __init__

    def items(self):
        return self._urls

    def location(self, obj):
        return obj

You can invoke it with something like this in your main sitemap routine:

sitemap['static'] = StaticSitemap(settings.DIR_ROOT +'/sitemap.txt')

And our sitemap.txt file looks something like this:

# One URL per line.
# All paths start from root - i.e., with a leading /
# Blank lines are OK.

/tour/
/podcast_archive/
/related_sites/
/survey/
/youtube_videos/

/teachers/
/workshops/
/workshop_listing_info/

/aboutus/
/history/
/investment/
/business/
/contact/
/privacy_policy/
/graphic_specs/
/help_desk/
like image 173
Peter Rowell Avatar answered Jan 25 '26 20:01

Peter Rowell



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!