On a Mac, I have Jupyter installed and when I type <code>jupyter notebook</code> from the root folder of my Scrapy project, it opens the notebook. I can browse all of the project files at this point. How do I execute the project from the notebook? If I click the Running tab, under Terminals, I see: <pre class="prettyprint"><code>There are no terminals running. </code></pre>

There are two main ways to achieve that: 1. Under the Files tab open a new terminal: New > Terminal Then simply run you spider: <code>scrapy crawl [options] <spider></code> 2. Create a new notebook and use <code>CrawlerProcess</code> or <code>CrawlerRunner</code> classes to run in a cell: <pre class="prettyprint"><code>from scrapy.crawler import CrawlerProcess from scrapy.utils.project import get_project_settings process = CrawlerProcess(get_project_settings()) process.crawl('your-spider') process.start() # the script will block here until the crawling is finished </code></pre> Scrapy docs - Run Scrapy from a script

How to run Scrapy project in Jupyter?

Tags:

python

jupyter

scrapy

On a Mac, I have Jupyter installed and when I type jupyter notebook from the root folder of my Scrapy project, it opens the notebook. I can browse all of the project files at this point.

How do I execute the project from the notebook?

If I click the Running tab, under Terminals, I see:

There are no terminals running.

586

asked Nov 29 '16 02:11

4thSpace

2 Answers

There are two main ways to achieve that:

1. Under the Files tab open a new terminal: New > Terminal
Then simply run you spider: scrapy crawl [options] <spider>

2. Create a new notebook and use CrawlerProcess or CrawlerRunner classes to run in a cell:

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

process = CrawlerProcess(get_project_settings())

process.crawl('your-spider')
process.start() # the script will block here until the crawling is finished

Scrapy docs - Run Scrapy from a script

105

answered Sep 24 '22 23:09

Paulo Romeira

No Need of Terminal to run Spyder Class. Just add the following code in your jupyter-notebook cell:

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider(scrapy.Spider):
    # Your spider definition
    ...

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished

For more information see here

answered Sep 21 '22 23:09

susan097

Related questions
                            
                                How to sort by value efficiently in PySpark?
                            
                                How to access current user in Django class based view
                            
                                Find strings in list using wildcard
                            
                                AttributeError: 'str' object has no attribute 'regex' django 1.9
                            
                                Make a list of ints hashable in python
                            
                                what is the best way to save tuples in python
                            
                                Oauth2 lib cannot import name 'run'
                            
                                HSV2BGR conversion fails in Python OpenCV script
                            
                                selenium webdriver takes too long to load a page
                            
                                Displaying only one tooltip when using the HoverTool() tool
                            
                                Map a NumPy array of strings to integers
                            
                                Django CKEditor Image Uploads not appearing
                            
                                Dictionary in a numpy array?
                            
                                Slicing a MultiIndex DataFrame by multiple values from a specified level
                            
                                SQLAlchemy. Creating tables that share enum
                            
                                Write formula to Excel with Python
                            
                                How to load a pre-trained Word2vec MODEL File and reuse it?
                            
                                How to create a Django superuser if it doesn't exist non-interactively?
                            
                                Different colours for arrows in quiver plot
                            
                                Compare two Python methods in PyCharm

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With