Using Scrapy in Jupyter notebook / accessing response directly

Tags:

I want to directly interact with a Scrapy response object in a Jupyter notebook, the same way you can after entering the Scrapy shell by typing scrapy shell "some-url" in the command line.

In a notebook, I can run these commands without error:

import scrapy
request = scrapy.Request("some-url")
response = scrapy.http.Response("some-url")

But request and response both have an empty body property. According to the docs:

Typically, Request objects are generated in the spiders and pass across the system until they reach the Downloader, which executes the request and returns a Response object which travels back to the spider that issued the request.

It seems I'm missing the step where "the Downloader" executes a request object and returns a Response object. I can't figure out how that works.

Does anyone know what happens when you run scrapy shell "some-url"in the command line, so I can replicate those steps in a a Jupyter notebook?

Note: A very similar question was posted here, and the given answer works for me, but using the additional, third-party "Requests" library seems unnecessary/ non-ideal.

949

asked Apr 18 '18 20:04

Dustin Michels

1 Answers

You can approach the problem this way

import requests
from scrapy.http import TextResponse

res = requests.get('some-url')
response = TextResponse(res.url, body=res.text, encoding='utf-8')

152

answered Oct 10 '22 05:10

BT Einstein

Related questions
                            
                                Is it safe that when Two asyncio tasks access the same awaitable object?
                            
                                Keras - .flow_from_directory(directory)
                            
                                Multinomial Logit model Python and Stata different results
                            
                                TypeError: unsupported operand type(s) for +: 'map' and 'float'
                            
                                Python add two sets and delete duplicate elements
                            
                                Error with opencv clahe.apply()
                            
                                Pyspark- Subquery in a case statement
                            
                                Vectorized dictionary in Python
                            
                                Error Trying to initialize Dash in Spyder IPython Console
                            
                                Why does defining tf.Session with and without context manager in Tensorflow result in different behaviour?
                            
                                How to specify that an attribute must be a list of (say) integers, not just a list?
                            
                                SSL: CERTIFICATE_VERIFY_FAILED error from Python pip in Ubuntu 16.0.4
                            
                                Why does pandas roll a week forward when using resample with W-MON frequency?
                            
                                How to get the result of an SQL query from Big Query in Airflow?
                            
                                C++ 17 compatability with Python 2.7
                            
                                Difference between zip() functions in Python 2 and Python 3 [duplicate]
                            
                                AttributeError: module 'networkx' has no attribute 'utils'
                            
                                Could not find a version that satisfies the requirement numpy == 1.9.3
                            
                                Create a Python executable with chromedriver & Selenium
                            
                                Correct way of setting Python class attributes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using Scrapy in Jupyter notebook / accessing response directly

Tags:

python

python-3.x

jupyter-notebook

web-scraping

scrapy

Dustin Michels

People also ask

1 Answers

BT Einstein

Recent Activity

Donate For Us