Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to send custom headers in a Scrapy Splash request?

My spider.py file is as so:

def start_requests(self):
    for url in self.start_urls:
        yield scrapy.Request(
            url,
            self.parse,
            headers={'My-Custom-Header':'Custom-Header-Content'},
            meta={
                'splash': {
                    'args': {
                        'html': 1,
                        'wait': 5,
                    },
                }
            },
        )

And my parse def is as below:

def parse(self, response):
    print(response.request.headers)

When I run my spider, below line gets printed as the header:

{
    b'Content-Type': [b'application/json'], 
    b'Accept': [b'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'],
    b'Accept-Language': [b'en'], 
    b'User-Agent': [b'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.2309.372 Safari/537.36'], 
    b'Accept-Encoding': [b'gzip,deflate']
}

AS you can see, this does not have the custom header I added to the Scrapy request.

Can anybody help me with adding a custom header values for this request?

Thanks in advance.

like image 633
Nadun Perera Avatar asked May 14 '19 11:05

Nadun Perera


1 Answers

If you want splash to use your headers in the request to your specified url, then you should add the headers to the args part, together with html and wait:

meta={
   'splash': {
        'args': {
            'html': 1,
            'wait': 5,
            'headers': {
                'My-Custom-Header': 'Custom-Header-Content',
            },
        },
    }
}
like image 119
Julius Š. Avatar answered Oct 31 '22 17:10

Julius Š.