I have an Azure Web App that I want to use to screen scrape a website when I call an Action on a controller, like so.
var driver = new PhantomJSDriver();
driver.Url = "http://url.com";
driver.Navigate();
var source = driver.PageSource;
var pathElement = driver.FindElementByXPath("//table[@class='someclassname']");
string innerHtml = "";
IJavaScriptExecutor js = driver as IJavaScriptExecutor;
if (js != null)
{
innerHtml = (string)js.ExecuteScript("return arguments[0].innerHTML;", pathElement);
}
return innerHtml;
This works fine locally, however when I upload to my Azure Web App, I get this error
Cannot start the driver service on http://localhost:51169/
I assume this has to do with firewalls since I need to approve PhantomJS in my firewall settings the first time the app runs. My question is how do I get this to work deployed in Azure? Is it even possible, or do I need to configure this as some Unit Test and run it from within Visual Studio?
PhantomJS does not work today in the sandbox that Azure Web Apps run under. See the wiki for a list of things that are known to not work currently, as well as lots of other information about the sandbox.
I would rethink your solution of using Selenium here. Selenium is used to automate manual testing of your webapp. Basically, automate filling out a form, click a button, etc.
Even if Selenium and your PhantomJS Driver does run on your Azure webapp without issues, you'll have a bottleneck of one browser per 1 Http request. I suspect you'll run into performance problems real soon.
Furthermore, the time it takes for drivers to load PhantomJS, request a page, interact, and close PhantomJS is slow.
In your case, it sounds like you're not interacting with your source site, you just need data. So perhaps just parsing the HTML DOM will suffice.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With