I want to directly interact with a Scrapy response
object in a Jupyter notebook, the same way you can after entering the Scrapy shell by typing scrapy shell "some-url"
in the command line.
In a notebook, I can run these commands without error:
import scrapy
request = scrapy.Request("some-url")
response = scrapy.http.Response("some-url")
But request
and response
both have an empty body property. According to the docs:
Typically, Request objects are generated in the spiders and pass across the system until they reach the Downloader, which executes the request and returns a Response object which travels back to the spider that issued the request.
It seems I'm missing the step where "the Downloader" executes a request object and returns a Response object. I can't figure out how that works.
Does anyone know what happens when you run scrapy shell "some-url"
in the command line, so I can replicate those steps in a a Jupyter notebook?
Note: A very similar question was posted here, and the given answer works for me, but using the additional, third-party "Requests" library seems unnecessary/ non-ideal.
Scrapy is an open-source framework for extracting the data from websites. It is fast, simple, and extensible. Every data scientist should have familiarity with this, as they often need to gather data in this manner.
While working with Scrapy, one needs to create scrapy project. In Scrapy, always try to create one spider which helps to fetch data, so to create one, move to spider folder and create one python file over there. Create one spider with name gfgfetch.py python file. Move to the spider folder and create gfgfetch.py .
You can approach the problem this way
import requests
from scrapy.http import TextResponse
res = requests.get('some-url')
response = TextResponse(res.url, body=res.text, encoding='utf-8')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With