Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing extra arguments to scrapy.Request()

Actually I want to store all the data(text,hrefs,images) related to specific website to a single folder.In order to do that I need to pass the path for that folder to all different parsing function.So I want to pass this path as extra kwargs in scrapy.Request() like this:

yield scrapy.Request(url=url,dont_filter=True, callback=self.parse,errback = self.errback_function,kwargs={'path': '/path/to_folder'})

But it gives the error TypeError: __init__() got an unexpected keyword argument 'kwargs'

How can I pass that path to next function?

like image 804
Amrit Avatar asked Oct 05 '17 06:10

Amrit


People also ask

How are arguments passed in Scrapy?

The spider will receive arguments in its constructor. Scrapy puts all the arguments as spider attributes and you can skip the init method completely. Beware use getattr method for getting those attributes so your code does not break.

How do I make a Scrapy request?

Making a request is a straightforward process in Scrapy. To generate a request, you need the URL of the webpage from which you want to extract useful data. You also need a callback function. The callback function is invoked when there is a response to the request.

What does Scrapy request do?

Scrapy uses Request and Response objects for crawling web sites. Typically, Request objects are generated in the spiders and pass across the system until they reach the Downloader, which executes the request and returns a Response object which travels back to the spider that issued the request.

How do you get cookie response from Scrapy?

log(cook1) self. log("end cookie2") return Request("http://something.net/some/sa/"+response.headers.getlist('Location')[0],cookies={cook1[0]:cook1[1]}, callback=self. check_login_response) . . .


1 Answers

For anyone who may need it......

You can pass extra arguments by using meta arguments like this...

   yield scrapy.Request(url=url,dont_filter=True, 
callback=self.parse,errback = self.errback_function,  meta={'filepath': filepath})

UPDATE:

Request.cb_kwargs was introduced in version 1.7. Prior to that, using Request.meta was recommended for passing information around callbacks. After 1.7, Request.cb_kwargs became the preferred way for handling user information, leaving Request.meta for communication with components like middlewares and extensions.

So for version >= 1.7 following would work :

   request = scrapy.Request('http://www.example.com/index.html',
                             callback=self.parse_page2,
                             cb_kwargs=dict(main_url=response.url))

you can refer to this documentation: https://doc.scrapy.org/en/latest/topics/request-response.html#passing-additional-data-to-callback-functions

like image 77
Amrit Avatar answered Oct 06 '22 01:10

Amrit