I managed to run a scrapy program in Ubuntu terminal. However, I cannot use Ctrl+Z and bg command to let it run in the background. It will close the spider connection everytime I press Ctrl + Z.
Is there any workaround or ways to solve the issue?
Using the scrapy tool You can start by running the Scrapy tool with no arguments and it will print some usage help and the available commands: Scrapy X.Y - no active project Usage: scrapy <command> [options] [args] Available commands: crawl Run a spider fetch Fetch a URL using the Scrapy downloader [...]
Basic ScriptThe key to running scrapy in a python script is the CrawlerProcess class. This is a class of the Crawler module. It provides the engine to run scrapy within a python script. Within the CrawlerProcess class code, python's twisted framework is imported.
To begin the project, we can run the scrapy startproject command along with the name we will call the project. The target website is located at https://books.toscrape.com. We can open the project in PyCharm and the project folder structure should look familiar to you at this point.
Run the code with scrapy crawl spider -o next_page. json and check the result.
The simplest solution is to use nohup
together with &
, with the following syntax:
nohup python parser.py &
While the &
suffix gets it running in the background, closing the session would kill the process anyway. nohup
creates a session-independent process, suitable for all kinds of environments (such as SSH sessions and remote servers, for example) and stores all console output in a log file.
If you run your spider with scrapy crawl
:
If you want to keep the logs: scrapy crawl my_spider > /path/to/logfile.txt 2>&1 &
If you want to dismiss the logs: scrapy crawl my_spider > /dev/null 2>&1 &
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With