Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Instruct Scrapy to ignore the content length of a site

Question

How can I ignore the content length of a response in Scrapy?

Explanation

Consider this curl command"

curl -u <user:pass> http://data.icecat.biz/export/level4/NL/files.index.xml

It currently fails because the content-length header has been set incorrectly by Icecat.

We can fix this by ignoring the content-length by using the ignore-content-length parameter of curl:

curl --ignore-content-length -u <user:pass> http://data.icecat.biz/export/level4/NL/files.index.xml

And everything works fine!

However I have no clue how to do this in Scrapy. Google and the documentation reveals nothing to me.

Before I dig into the Scrapy code to fix this, perhaps somebody already did this.

like image 391
Pullie Avatar asked Oct 17 '25 13:10

Pullie


1 Answers

This issue is fixed in Scrapy 1.5 New setting is introduced - DOWNLOAD_FAIL_ON_DATALOSS You need to set it to False in your project settings.

like image 114
Billy Jhon Avatar answered Oct 20 '25 03:10

Billy Jhon



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!