I have a scrapy spider that uses splash which runs on Docker localhost:8050 to render javascript before scraping. I am trying to run this on heroku but have no idea how to configure heroku to start docker to run splash before running my web: scrapy crawl abc dyno. Any guides is greatly appreciated!
From what I gather you're expecting:
docker
CLI and heroku
CLI installeddocker
CLI and heroku
CLI are installedheroku container:login
docker tag scrapinghub/splash registry.heroku.com/<app-name>/web
docker push registry.heroku.com/<app-name>/web
heroku open -a <app-name>
. This should allow you to see the Splash UI at port 8050 on the Heroku host for this app name.
$PORT
is set appropriately as the EXPOSE
docker configuration is not respected (https://devcenter.heroku.com/articles/container-registry-and-runtime#dockerfile-commands-and-runtime)<app-host-name>:8050
. And the Scrapy spider should now be able to request to the Splash instance previously run.Run at the same problem. Finally, I succesfully deployed splash docker image on Heroku. This is my solution: I cloned the splash proyect from github and changed the Dockerfile.
CMD python3 /app/bin/splash --proxy-profiles-path /etc/splash/proxy-profiles --js-profiles-path /etc/splash/js-profiles --filters-path /etc/splash/filters --lua-package-path /etc/splash/lua_modules/?.lua --port $PORT
Notice that I added the option --port=$PORT. This is just to listen at the port specified by Heroku instead of the default (8050)
A fork to the proyect with this change its avaliable here You just need to build the docker image and push it to the heroku's registry, like you did before. You can test it locally first but you must pass the environment variable "PORT" when running the docker
sudo docker run -p 80:80 -e PORT=80 mynewsplashimage
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With