Instruct Scrapy to ignore the content length of a site

Question

Question

How can I ignore the content length of a response in Scrapy?

Explanation

Consider this curl command"

curl -u <user:pass> http://data.icecat.biz/export/level4/NL/files.index.xml

It currently fails because the content-length header has been set incorrectly by Icecat.

We can fix this by ignoring the content-length by using the ignore-content-length parameter of curl:

curl --ignore-content-length -u <user:pass> http://data.icecat.biz/export/level4/NL/files.index.xml

And everything works fine!

However I have no clue how to do this in Scrapy. Google and the documentation reveals nothing to me.

Before I dig into the Scrapy code to fix this, perhaps somebody already did this.

Billy Jhon · Accepted Answer

This issue is fixed in Scrapy 1.5 New setting is introduced - DOWNLOAD_FAIL_ON_DATALOSS You need to set it to False in your project settings.

Donate For Us