Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load ajax with HtmlUnit?

import java.io.IOException;
import java.net.MalformedURLException;
import java.util.List;

import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlButton;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlTextInput;

public class YoutubeBot {
private static final String YOUTUBE = "http://www.youtube.com";

public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
    WebClient webClient = new WebClient();
    webClient.setThrowExceptionOnScriptError(false);

    // This is equivalent to typing youtube.com to the adress bar of browser
    HtmlPage currentPage = webClient.getPage("http://www.youtube.com/results?search_type=videos&search_query=official+music+video&search_sort=video_date_uploaded&suggested_categories=10%2C24&uni=3");

    // Get form where submit button is located
    HtmlForm searchForm = (HtmlForm) currentPage.getElementById("masthead-search");

    // Get the input field.
    HtmlTextInput searchInput = (HtmlTextInput) currentPage.getElementById("masthead-search-term");
    // Insert the search term.
    searchInput.setText("java");

    // Workaround: create a 'fake' button and add it to the form.
    HtmlButton submitButton = (HtmlButton) currentPage.createElement("button");
    submitButton.setAttribute("type", "submit");
    searchForm.appendChild(submitButton);

    //Workaround: use the reference to the button to submit the form. 
    HtmlPage newPage = submitButton.click();

    //Find all links on page with given class
    final List<HtmlAnchor> listLinks = (List<HtmlAnchor>) currentPage.getByXPath("//a[@class='ux-thumb-wrap result-item-thumb']");      

    //Print all links to console
    for (int i=0; i<listLinks.size(); i++)
        System.out.println(YOUTUBE + listLinks.get(i).getAttribute("href"));

    }
}

This code is working but I just want to sort youtube clips for example by upload date. How to do this with HtmlUnit? I have to click on filter, this should load content by ajax request and then I should click on "Upload date" link. I just don't know this first step, to load ajax content. Is this possible with HtmlUnit?

like image 333
Иван Бишевац Avatar asked Jul 22 '11 22:07

Иван Бишевац


2 Answers

This worked for me. Set this

webClient.setAjaxController(new NicelyResynchronizingAjaxController());

This would cause all ajax calls to be synchronous.

This is how I setup my WebClient object

WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setCssEnabled(false);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getCookieManager().setCookiesEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getCookieManager().setCookiesEnabled(true);
like image 200
Varun Tulsian Avatar answered Oct 19 '22 06:10

Varun Tulsian


Here's one way to do it:

  1. Search the page as you did in your previous question.
  2. Select search-lego-refinements block by id.
  3. Use XPath to navigate to the URL (//ul/li/a when you start from the previous id).
  4. Click the selected link.

The following code sample shows how it could be done:

import java.io.IOException;
import java.net.MalformedURLException;
import java.util.List;

import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlButton;
import com.gargoylesoftware.htmlunit.html.HtmlElement;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlTextInput;

public class YoutubeBot {
   private static final String YOUTUBE = "http://www.youtube.com";

   @SuppressWarnings("unchecked")
   public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
      WebClient webClient = new WebClient();
      webClient.setThrowExceptionOnScriptError(false);

      // This is equivalent to typing youtube.com to the adress bar of browser
      HtmlPage currentPage = webClient.getPage(YOUTUBE);

      // Get form where submit button is located
      HtmlForm searchForm = (HtmlForm) currentPage.getElementById("masthead-search");

      // Get the input field
      HtmlTextInput searchInput = (HtmlTextInput) currentPage.getElementById("masthead-search-term");

      // Insert the search term
      searchInput.setText("java");

      // Workaround: create a 'fake' button and add it to the form
      HtmlButton submitButton = (HtmlButton) currentPage.createElement("button");
      submitButton.setAttribute("type", "submit");
      searchForm.appendChild(submitButton);

      // Workaround: use the reference to the button to submit the form.
      currentPage = submitButton.click();

      // Get the div containing the filters
      HtmlElement filterDiv = currentPage.getElementById("search-lego-refinements");

      // Select the first link from the filter block (Upload date)
      HtmlAnchor sortByDateLink = ((List<HtmlAnchor>) filterDiv.getByXPath("//ul/li/a")).get(0);

      // Click the 'Upload date' link
      currentPage = sortByDateLink.click();

      System.out.println(currentPage.asText());
   }
}

You could just browse the correct query URL as well (http://www.youtube.com/results?search_type=videos&search_query=nyan+cat&search_sort=video_date_uploaded).

But then you would have to encode your search parameter(s) (replace spaces with + for example).

like image 39
Jasper Avatar answered Oct 19 '22 07:10

Jasper