I intend to create an Android application that performs a headless login to a website and then scrape some content from the subsequent page while maintaining the logged-in session.
I first used HtmlUnit in a normal Java project and it worked just fine. But later found that HtmlUnit is not compatible with Android.
Then I tried JSoup library by sending HTTP “POST” request to the login form. But the resulting page does not load up completely since JSoup won't support JavaScript.
I was then suggested to have a look on Selendroid which actually is an android test automation framework. But what I actually need is an Html parser that supports both JavaScript and Android. I find Selendroid quite difficult to understand which I can't even figure out which dependencies to use.
With Selenium WebDriver, the code would be as simple as the following. But can somebody show me a similar code example for Selendroid as well?
WebDriver driver = new FirefoxDriver();
driver.get("https://mail.google.com/");
driver.findElement(By.id("email")).sendKeys(myEmail);
driver.findElement(By.id("pass")).sendKeys(pass);
// Click on 'Sign In' button
driver.findElement(By.id("signIn")).click();
And also,
C#, and . NET in general, have all the necessary tools and libraries for you to implement your own data scraper, and especially with tools like Puppeteer and Selenium it is easy to quickly implement a crawler project and get the data you want.
Selenium is needed in order to carry out web scraping and automate the chrome browser we'll be using. Selenium uses the webdriver protocol, therefore the webdriver manager is imported to obtain the ChromeDriver compatible with the version of the browser being used.
Web scraping with Robotic Process Automation (RPA) utilizes bots to automate the process of web data extraction from selected websites and store it for use. RPA delivers faster results by eliminating the need for manual data entry and reducing human errors.
Scraping makes the website traffic spike and may cause the breakdown of the website server. Thus, not all websites allow people to scrape.
Unfortunately I didn't get Selendroid to work. But I find a workaround to scrape dynamic content by using just Android's built in WebView with JavaScript enabled.
mWebView = new WebView();
mWebView.getSettings().setJavaScriptEnabled(true);
mWebView.addJavascriptInterface(new HtmlHandler(), "HtmlHandler");
mWebView.setWebViewClient(new WebViewClient() {
@Override
public void onPageFinished(WebView view, String url) {
super.onPageFinished(view, url);
if (url == urlToLoad) {
// Pass html source to the HtmlHandler
WebView.loadUrl("javascript:HtmlHandler.handleHtml(document.documentElement.outerHTML);");
}
});
The JS method document.documentElement.outerHTML
will retrieve the full html contained in the loaded url. Then the retrived html string is sent to handleHtml method in HtmlHandler class.
class HtmlHandler {
@JavascriptInterface
@SuppressWarnings("unused")
public void handleHtml(String html) {
// scrape the content here
}
}
You may use a library like Jsoup to scrape the necessary content from the html String.
I never had used Selendroid
so I'm not really sure about that but searching by the net I found this example and, according to it, I suppose that your code translation from Selenium
to Selendroid
would be:
Translation code (in my opinion)
public class MobileWebTest {
private SelendroidLauncher selendroidServer = null;
private WebDriver driver = null;
@Test
public void doTest() {
driver.get("https://mail.google.com/");
WebElement email = driver.findElement(By.id("email")).sendKeys(myEmail);
WebElement password = driver.findElement(By.id("pass")).sendKeys(pass);
WebElement button = driver.findElement(By.id("signIn")).click();
driver.quit();
}
@Before
public void startSelendroidServer() throws Exception {
if (selendroidServer != null) {
selendroidServer.stopSelendroid();
}
SelendroidConfiguration config = new SelendroidConfiguration();
selendroidServer = new SelendroidLauncher(config);
selendroidServer.launchSelendroid();
DesiredCapabilities caps = SelendroidCapabilities.android();
driver = new SelendroidDriver(caps);
}
@After
public void stopSelendroidServer() {
if (driver != null) {
driver.quit();
}
if (selendroidServer != null) {
selendroidServer.stopSelendroid();
}
}
}
What do you have to add to your project
It seems that you have to add to your project the Selendroid standalone jar file
. If you have doubts about how to add a external jar in an Android project you can see this question: How can I use external JARs in an Android project?
Here you can download the jar file
: jar file
Also, it seems that it is not enough just to add the jar file
to your project. You should add too the selendroid-client jar file
of the version of standalone that you have.
You can download it from here: client jar file
I expect it will be helpful for you!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With