I want to execute my scrapy crawler from cron job .
i create bash file getdata.sh where scrapy project is located with it's spiders
#!/bin/bash
cd /myfolder/crawlers/
scrapy crawl my_spider_name
My crontab looks like this , I want to execute it in every 5 minute
*/5 * * * * sh /myfolder/crawlers/getdata.sh
but it don't works , whats wrong , where is my error ?
when I execute my bash file from terminal sh /myfolder/crawlers/getdata.sh it works fine
I solved this problem including PATH into bash file
#!/bin/bash
cd /myfolder/crawlers/
PATH=$PATH:/usr/local/bin
export PATH
scrapy crawl my_spider_name
Adding the following lines in crontab -e
runs my scrapy crawl at 5AM every day. This is a slightly modified version of crocs' answer
PATH=/usr/bin
* 5 * * * cd project_folder/project_name/ && scrapy crawl spider_name
Without setting $PATH
, cron would give me an error "command not found: scrapy". I guess this is because /usr/bin is where scripts to run programs are stored in Ubuntu.
Note that the complete path for my scrapy project is /home/user/project_folder/project_name
. I ran the env command in cron and noticed that the working directory is /home/user
. Hence I skipped /home/user
in my crontab above
The cron log can be helpful while debugging
grep CRON /var/log/syslog
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With