Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selendroid as a web scraper

I intend to create an Android application that performs a headless login to a website and then scrape some content from the subsequent page while maintaining the logged-in session.

I first used HtmlUnit in a normal Java project and it worked just fine. But later found that HtmlUnit is not compatible with Android.

Then I tried JSoup library by sending HTTP “POST” request to the login form. But the resulting page does not load up completely since JSoup won't support JavaScript.

I was then suggested to have a look on Selendroid which actually is an android test automation framework. But what I actually need is an Html parser that supports both JavaScript and Android. I find Selendroid quite difficult to understand which I can't even figure out which dependencies to use.

  • selendroid-client
  • selendroid-standalone
  • selendroid-server

With Selenium WebDriver, the code would be as simple as the following. But can somebody show me a similar code example for Selendroid as well?

    WebDriver driver = new FirefoxDriver();
    driver.get("https://mail.google.com/");

    driver.findElement(By.id("email")).sendKeys(myEmail);
    driver.findElement(By.id("pass")).sendKeys(pass);

    // Click on 'Sign In' button
    driver.findElement(By.id("signIn")).click();

And also,

  1. What dependencies to add to my Gradle.Build file?
  2. Which Selendroid libraries to import?
like image 751
Gayan Weerakutti Avatar asked May 05 '15 16:05

Gayan Weerakutti


People also ask

Can C# be used for web scraping?

C#, and . NET in general, have all the necessary tools and libraries for you to implement your own data scraper, and especially with tools like Puppeteer and Selenium it is easy to quickly implement a crawler project and get the data you want.

Can selenium be used for scraping?

Selenium is needed in order to carry out web scraping and automate the chrome browser we'll be using. Selenium uses the webdriver protocol, therefore the webdriver manager is imported to obtain the ChromeDriver compatible with the version of the browser being used.

Can web scraping be automated?

Web scraping with Robotic Process Automation (RPA) utilizes bots to automate the process of web data extraction from selected websites and store it for use. RPA delivers faster results by eliminating the need for manual data entry and reducing human errors.

Can you Webscrape any website?

Scraping makes the website traffic spike and may cause the breakdown of the website server. Thus, not all websites allow people to scrape.


2 Answers

Unfortunately I didn't get Selendroid to work. But I find a workaround to scrape dynamic content by using just Android's built in WebView with JavaScript enabled.

mWebView = new WebView();
mWebView.getSettings().setJavaScriptEnabled(true);
mWebView.addJavascriptInterface(new HtmlHandler(), "HtmlHandler");

mWebView.setWebViewClient(new WebViewClient() {
   @Override
   public void onPageFinished(WebView view, String url) {
       super.onPageFinished(view, url);

       if (url == urlToLoad) {
       // Pass html source to the HtmlHandler
       WebView.loadUrl("javascript:HtmlHandler.handleHtml(document.documentElement.outerHTML);");

   }
});

The JS method document.documentElement.outerHTML will retrieve the full html contained in the loaded url. Then the retrived html string is sent to handleHtml method in HtmlHandler class.

class HtmlHandler {
        @JavascriptInterface
        @SuppressWarnings("unused")
        public void handleHtml(String html) {
            // scrape the content here

        }
    }

You may use a library like Jsoup to scrape the necessary content from the html String.

like image 89
Gayan Weerakutti Avatar answered Oct 16 '22 22:10

Gayan Weerakutti


I never had used Selendroid so I'm not really sure about that but searching by the net I found this example and, according to it, I suppose that your code translation from Selenium to Selendroid would be:

Translation code (in my opinion)

public class MobileWebTest {
  private SelendroidLauncher selendroidServer = null;
  private WebDriver driver = null;

  @Test
  public void doTest() {
    
     driver.get("https://mail.google.com/");

     WebElement email = driver.findElement(By.id("email")).sendKeys(myEmail);
     WebElement password = driver.findElement(By.id("pass")).sendKeys(pass);

     WebElement button = driver.findElement(By.id("signIn")).click();

     driver.quit();
  }

  @Before
  public void startSelendroidServer() throws Exception {
    if (selendroidServer != null) {
      selendroidServer.stopSelendroid();
    }

    SelendroidConfiguration config = new SelendroidConfiguration();

    selendroidServer = new SelendroidLauncher(config);
    selendroidServer.launchSelendroid();

    DesiredCapabilities caps = SelendroidCapabilities.android();

    driver = new SelendroidDriver(caps);
  }

  @After
  public void stopSelendroidServer() {
    if (driver != null) {
      driver.quit();
    }
    if (selendroidServer != null) {
      selendroidServer.stopSelendroid();
    }
  }
}

What do you have to add to your project

It seems that you have to add to your project the Selendroid standalone jar file. If you have doubts about how to add a external jar in an Android project you can see this question: How can I use external JARs in an Android project?

Here you can download the jar file: jar file

Also, it seems that it is not enough just to add the jar file to your project. You should add too the selendroid-client jar file of the version of standalone that you have.

You can download it from here: client jar file

I expect it will be helpful for you!

like image 1
Francisco Romero Avatar answered Oct 16 '22 22:10

Francisco Romero