How to add Headers to Scrapy CrawlSpider Requests?

Tags:

scrapy

I'm working with the CrawlSpider class to crawl a website and I would like to modify the headers that are sent in each request. Specifically, I would like to add the referer to the request.

As per this question, I checked

response.request.headers.get('Referer', None)

in my response parsing function and the Referer header is not present. I assume that means the Referer is not being submitted in the request (unless the website doesn't return it, I'm not sure on that).

I haven't been able to figure out how to modify the headers of a request. Again, my spider is derived from CrawlSpider. Overriding CrawlSpider's _requests_to_follow or specifying a process_request callback for a rule will not work because the referer is not in scope at those times.

Does anyone know how to modify request headers dynamically?

947

asked Jan 08 '13 16:01

CatShoes

2 Answers

I hate to answer my own question, but I found out how to do it. You have to enable the SpiderMiddleware that will populate the referer for responses. See the documentation for scrapy.contrib.spidermiddleware.referer.RefererMiddleware

In short, you need to add this middleware to your project's settings file.

SPIDER_MIDDLEWARES = {
'scrapy.contrib.spidermiddleware.referer.RefererMiddleware': True,
}

Then in your response parsing method you can use, response.request.headers.get('Referrer', None), to get the referer.

If you understand these middlewares right away, read them again, take a break, and then read them again. I found them to be very confusing.

answered Sep 27 '22 17:09

CatShoes

You can pass REFERER manually to each request using headers argument:

yield Request(parse=..., headers={'referer':...})

RefererMiddleware does the same, automatically taking the referrer url from the previous response.

answered Sep 27 '22 17:09

warvariuc

Related questions
                            
                                SSL: CERTIFICATE_VERIFY_FAILED error with python3 on macOS 10.15
                            
                                CSV file with Arabic characters is displayed as symbols in Excel
                            
                                Cannot import SQLite with Python 2.6
                            
                                In Python, is there a concise way to use a list comprehension with multiple iterators?
                            
                                What is the best way to toggle python prints?
                            
                                How Do I Use A Decimal Number In A Django URL Pattern?
                            
                                CPython is bytecode interpreter?
                            
                                What is Ruby's analog to Python Metaclasses?
                            
                                Why don't you need a powerful ide for writing Python? [closed]
                            
                                How to create a dictionary from a line of text?
                            
                                Is it faster to union sets or check the whole list for a duplicate?
                            
                                Printing unescaped white space to shell
                            
                                STATIC_URL undefined in base Django template
                            
                                django south migration, doesnt set default
                            
                                Compare date and datetime in Django
                            
                                python isdigit() function return true for non digit character u'\u2466'
                            
                                Combining base url with resultant href in scrapy
                            
                                python convert ipv6 to an integer
                            
                                How to get random single document from 1 billion documents in mongoDB using python? [duplicate]
                            
                                Can't sort my list because it is NoneType? Simple Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to add Headers to Scrapy CrawlSpider Requests?

Tags:

python

scrapy

CatShoes

People also ask

2 Answers

CatShoes

warvariuc

Recent Activity

Donate For Us