Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy crawler in Cron job

Tags:

I want to execute my scrapy crawler from cron job .

i create bash file getdata.sh where scrapy project is located with it's spiders

#!/bin/bash
cd /myfolder/crawlers/
scrapy crawl my_spider_name

My crontab looks like this , I want to execute it in every 5 minute

 */5 * * * * sh /myfolder/crawlers/getdata.sh 

but it don't works , whats wrong , where is my error ?

when I execute my bash file from terminal sh /myfolder/crawlers/getdata.sh it works fine

like image 555
Beka Avatar asked Jun 21 '13 12:06

Beka


2 Answers

I solved this problem including PATH into bash file

#!/bin/bash

cd /myfolder/crawlers/
PATH=$PATH:/usr/local/bin
export PATH
scrapy crawl my_spider_name
like image 99
Beka Avatar answered Oct 06 '22 02:10

Beka


Adding the following lines in crontab -e runs my scrapy crawl at 5AM every day. This is a slightly modified version of crocs' answer

PATH=/usr/bin
* 5 * * * cd project_folder/project_name/ && scrapy crawl spider_name

Without setting $PATH, cron would give me an error "command not found: scrapy". I guess this is because /usr/bin is where scripts to run programs are stored in Ubuntu.

Note that the complete path for my scrapy project is /home/user/project_folder/project_name. I ran the env command in cron and noticed that the working directory is /home/user. Hence I skipped /home/user in my crontab above

The cron log can be helpful while debugging

grep CRON /var/log/syslog
like image 45
NFern Avatar answered Oct 06 '22 02:10

NFern