I tried running the sitemap spider but it refused to crawl gzipped sitemaps.It gave the following error
[scrapy] WARNING: Ignoring non-XML sitemap
is there a setting that needs to be enabled to allow parsing of gzipped sitemaps?
I use scrapy version 0.15
Scrapy should automatically unzip the gzipped content.
See the responsible code in contrib/spiders/sitemap.py
if isinstance(response, XmlResponse):
body = response.body
elif is_gzipped(response):
body = gunzip(response.body)
else:
log.msg("Ignoring non-XML sitemap: %s" % response, log.WARNING)
return
I think either the XML is malformed, or the file isn't gzipped with the proper headers. I suggest trying the same spider on a sitemap of which you are sure of it's formatting.
If you want I can run test of my own, if you can provide me with your current code -- it'll allow me to give you a better answer :-).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With