Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Run scrapy in background (Ubuntu)

I managed to run a scrapy program in Ubuntu terminal. However, I cannot use Ctrl+Z and bg command to let it run in the background. It will close the spider connection everytime I press Ctrl + Z.

Is there any workaround or ways to solve the issue?

like image 914
Kennedy Kan Avatar asked May 31 '17 17:05

Kennedy Kan


People also ask

How do I run Scrapy in terminal?

Using the scrapy tool You can start by running the Scrapy tool with no arguments and it will print some usage help and the available commands: Scrapy X.Y - no active project Usage: scrapy <command> [options] [args] Available commands: crawl Run a spider fetch Fetch a URL using the Scrapy downloader [...]

How do I run a Scrapy file?

Basic ScriptThe key to running scrapy in a python script is the CrawlerProcess class. This is a class of the Crawler module. It provides the engine to run scrapy within a python script. Within the CrawlerProcess class code, python's twisted framework is imported.

How do I start a Scrapy project?

To begin the project, we can run the scrapy startproject command along with the name we will call the project. The target website is located at https://books.toscrape.com. We can open the project in PyCharm and the project folder structure should look familiar to you at this point.

How do you get to the next Scrapy page?

Run the code with scrapy crawl spider -o next_page. json and check the result.


2 Answers

The simplest solution is to use nohup together with &, with the following syntax:

nohup python parser.py &

While the & suffix gets it running in the background, closing the session would kill the process anyway. nohup creates a session-independent process, suitable for all kinds of environments (such as SSH sessions and remote servers, for example) and stores all console output in a log file.

like image 183
Matías Zanolli Avatar answered Sep 24 '22 22:09

Matías Zanolli


If you run your spider with scrapy crawl:

  • If you want to keep the logs: scrapy crawl my_spider > /path/to/logfile.txt 2>&1 &

  • If you want to dismiss the logs: scrapy crawl my_spider > /dev/null 2>&1 &

like image 23
Adrien Blanquer Avatar answered Sep 21 '22 22:09

Adrien Blanquer