I am trying to run Scrapyd on a virtual Ubuntu 16.04 server, to which I connect via SSH. When I run scrapy by simply running
$ scrapyd
I can connect to the web interface by going to http://82.165.102.18:6800.
However, once I close the SSH connection, the web interface is no longer available, therefore, I think I need to run Scrapyd in the background as a service somehow.
After some research I came across a few proposed solutions:
Does someone know what the best / recommended solution is? Unfortunately, the Scrapyd documentation is rather thin and outdated.
For some background, I need to run about 10-15 spiders on a daily basis.
sudo nano /lib/systemd/system/scrapyd.service
Then copy-paste following
[Unit]
Description=Scrapyd service
After=network.target
[Service]
User=<YOUR-USER>
Group=<USER-GROUP>
WorkingDirectory=/any/directory/here
ExecStart=/usr/local/bin/scrapyd
[Install]
WantedBy=multi-user.target
Then enable service
systemctl enable scrapyd.service
Then start service
systemctl start scrapyd.service
Use this command.
cd /path/to/your/project/folder && nohup scrapyd >& /dev/null &
Now you can close your SSH connection but scrapyd will keep running.
And to make sure that whenever your server restarts and scrapyd runs automatically. Do this
copy the output of echo $PATH
from your terminal, and then open your crontab by crontab -e
Now at the very top of that file, write this
PATH=YOUR_COPIED_CONTENT
And now at the end of your crontab, write this.
@reboot cd /path/to/your/project/folder && nohup scrapyd >& /dev/null &
This means, each time your server is restarted, above command will automatically run
To have scrapyd run as daemon, you can simply do:
$ scrapyd &
The & at the end makes scrapyd run as daemon.
Or, you can run the following command to load the service on the scrapy folder:
$ daemon --chdir=/home/ubuntu/crawler scrapyd
As you have mentioned, to use "daemon", you need to first install daemon on your ubuntu by
$ sudo apt-get install daemon
After having scrapyd run as daemon by doing one of the above ways, you should be able to access your scrapyd web interface after closing your ssh connection.
Supervisor is a great way to daemonize scrapyd. Installation is generally straightforward. Once you have it set up, starting and stopping the service is as easy as:
$ supervisorctl start scrapyd
$ supervisorctl stop scrapyd
If you choose this route, note that supervisord may throw a warning about not finding the configuration file. One way to fix this is to simply add a reference to the configuration in the init.d script:
prog_bin="${exec_prefix}/bin/supervisord -c /etc/supervisord.conf"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With