I'm setting the headers following way
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'cache-control': 'no-cache',
...
}
And calling request like that:
yield scrapy.Request(url='https:/myurl.com/', callback=self.parse,
headers=headers, cookies=cookies, meta={'proxy': 'http://localhost:8888'})
And it makes that scrapy capitalizes all these headers and it looks like that (I'm using Charles proxy for debugging):
Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Cache-Control: no-cache
And this is not working correctly for my case.
If I'm using curl and set headers lowercase
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
cache-control: no-cache
everything works like a charm.
Is there any way how I can disable this capitalizing behavior in Scrapy? Thanks for any help!
This can't be done out of the box with Scrapy.
Reason: it is managing headers in a case insensitive way by design (see: https://github.com/scrapy/scrapy/blob/master/scrapy/http/headers.py). Guess they do it to avoid trouble with duplicate headers.
So most probably you'll have to do a fork and roll your own implementation of header handling or do at least some monkey patching.
But I'm wondering whether that is really what you need. I know that some websites do request header fingerprinting to detect bots, but the capitalized headers generated by scrapy look much more non-bot than the all-lowercase headers you want to generate for your requests.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With