Scrapy:Sitemap spider and gzipped files

Question

I tried running the sitemap spider but it refused to crawl gzipped sitemaps.It gave the following error

[scrapy] WARNING: Ignoring non-XML sitemap

is there a setting that needs to be enabled to allow parsing of gzipped sitemaps?

I use scrapy version 0.15

Sjaak Trekhaak · Accepted Answer

Scrapy should automatically unzip the gzipped content.

See the responsible code in contrib/spiders/sitemap.py

        if isinstance(response, XmlResponse):
            body = response.body
        elif is_gzipped(response):
            body = gunzip(response.body)
        else:
            log.msg("Ignoring non-XML sitemap: %s" % response, log.WARNING)
            return

I think either the XML is malformed, or the file isn't gzipped with the proper headers. I suggest trying the same spider on a sitemap of which you are sure of it's formatting.

If you want I can run test of my own, if you can provide me with your current code -- it'll allow me to give you a better answer :-).

Scrapy:Sitemap spider and gzipped files

Tags:

scrapy

sitemap

Sanket Gupta

1 Answers

Sjaak Trekhaak

Recent Activity

Donate For Us

Scrapy:Sitemap spider and gzipped files

Tags:

scrapy

sitemap

Sanket Gupta

1 Answers

Sjaak Trekhaak

Related questions

Recent Activity

Donate For Us