Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy:How to print request referrer

Tags:

python

scrapy

Is it possible to get the request referrer from the response object in parse function?

10x

like image 774
DjangoPy Avatar asked Aug 21 '12 12:08

DjangoPy


2 Answers

HTTP Referer field is set up by HTTP client in request headers, not in response headers, as this header tells server where did client come from to current page.

It would be rather weird to receive http Referer header in response.

But when talking about scrapy, there's a reference to Request object on which the Response was generated, in response's request field, so the next call result:

response.request.headers.get('Referer', None)

can contain Referer header if it was set when making request.

like image 132
Rostyslav Dzinko Avatar answered Oct 19 '22 17:10

Rostyslav Dzinko


The question above was asked a long time ago, and it has been answered well.

However, I thought I would add a different answer in case the answer by Rostyslav Dzinko does not apply/work in your case.

Let's say that you have 2 different parser methods:

  1. one parser (Let's call it parser_A) simply parses the list of articles (list page) to extract link info and others.
  2. Another parser (Let's call it parser_B) extracts article info from the target article (article page).

If you cannot get the url (referer url) for the list of articles (list page) once you are in the parser_B, you can set headers field in parser_A, then send it to parser_B as the following example:

yield scrapy.Request(url=article_page_url, callback=self.parser_B, dont_filter=True, headers={'referer_url': list_page_url})

And, in parser_B method, you can do the following to obtain the list page's url:

print(response.request.headers.get('referer_url'))

Hope this helps those who needed help.

like image 41
btaek Avatar answered Oct 19 '22 15:10

btaek