Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running selenium browser on server (Flask/Python/Heroku)

I am scraping some websites that seem to have pretty good protection against it. The only way I can get it to work is to use Selenium to load the page and then scrape stuff from that.

Currently this works on my local computer (a firefox windows opens and closed when I access my page and it's HTML is processed further in my script). However, I need my scraper to be accessible on the web. The scraper is embedded within a Flask app on Heroku. Is there a way to make the Selenium browser work on Heroku servers? Or are there any hosting providers where it can work?

like image 572
Javaaaa Avatar asked Apr 09 '13 14:04

Javaaaa


People also ask

Can I run Selenium on Heroku?

To use Selenium with Chrome on Heroku, you'll also need Chrome. We suggest one of these buildpacks: heroku-buildpack-google-chrome to run Chrome with the --headless flag. heroku-buildpack-xvfb-google-chrome to run Chrome against a virtual window server.

Does flask work with Heroku?

In this tutorial, you'll create a Python Flask example application and deploy it using Heroku, making it publicly available on the web. Heroku removes much of the infrastructure burden related to building and running web applications, allowing you to focus on creating an awesome app.

How do I connect my flask to Heroku?

Deploying Flask App on Heroku STEP 1 : Create a virtual environment with pipenv and install Flask and Gunicorn . STEP 2 : Create a “Procfile” and write the following code. STEP 3 : Create “runtime. txt” and write the following code.


2 Answers

Heroku, wonderful as it is, has a major limitation in that one cannot use custom software or in many cases, libraries. In providing an easy to use, centrally-controlled, managed stack, Heroku strips their servers down to prevent other usage.

What this boils down to is there is no Xorg on a Heroku dyno. Lack of Xorg and lack of ability to install custom software means no xvfb either, and no ability to run the browser that selenium expects to exist. Further, the browser is not generally available.

You'll have better luck with a cloud offering like AWS, where you can install custom software, including firefox, xvfb (to keep from needing all the Xorg overhead), and of course the rest of your scraping stack. This answer explains how to do it properly.

like image 180
Elliott Seyler Avatar answered Sep 24 '22 13:09

Elliott Seyler


There are buildpacks to make selenium work on heroku.

Add below buildpacks.

1) heroku buildpacks:add https://github.com/kevinsawicki/heroku-buildpack-xvfb-google-chrome/
2) heroku buildpacks:add https://github.com/heroku/heroku-buildpack-chromedriver

And set heroku stack to cedar-14 as shown below, as xvfb buildpack works only with cedar-14.

heroku stack:set cedar-14 -a stocksdata

Then point the google chrome location as below

options = ChromeOptions()
options.binary_location = "/app/.apt/usr/bin/google-chrome-stable"
driver = webdriver.Chrome(chrome_options=options)
like image 26
Shiva Tejesh Avatar answered Sep 22 '22 13:09

Shiva Tejesh