Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best HTTP library for Java?

Tags:

java

http

I wand to develop http client in Java for college project which login to site, obtain data from HTML data, complete and send forms. I don't know which http lib to use : Apache HTTP client - don't create DOM model but work with http redirects, multi threading. HTTPUnit - create DOM model and is easy to work with forms, fields, tables etc. but I don't know how will work with multi-threading and proxy settings.

Any advice ?

like image 718
user445550 Avatar asked Sep 12 '10 11:09

user445550


3 Answers

It sounds like you are trying to create a web-scraping application. For this purpose, I recommend the HtmlUnit library.

It makes it easy to work with forms, proxies, and data embedded in web pages. Under the hood I think it uses Apache's HttpClient to handle HTTP requests, but this is probably too low-level for you to be worried about.

With this library you can control a web page in Java the same way you would control it in a web browser: clicking a button, typing text, selecting values.

Here are some examples from HtmlUnit's getting started page:

Submitting a form:

@Test
public void submittingForm() throws Exception {
    final WebClient webClient = new WebClient();

    // Get the first page
    final HtmlPage page1 = webClient.getPage("http://some_url");

    // Get the form that we are dealing with and within that form, 
    // find the submit button and the field that we want to change.
    final HtmlForm form = page1.getFormByName("myform");

    final HtmlSubmitInput button = form.getInputByName("submitbutton");
    final HtmlTextInput textField = form.getInputByName("userid");

    // Change the value of the text field
    textField.setValueAttribute("root");

    // Now submit the form by clicking the button and get back the second page.
    final HtmlPage page2 = button.click();

    webClient.closeAllWindows();
}

Using a proxy server:

@Test
public void homePage_proxy() throws Exception {
    final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_2, "http://myproxyserver", myProxyPort);

    //set proxy username and password 
    final DefaultCredentialsProvider credentialsProvider = (DefaultCredentialsProvider) webClient.getCredentialsProvider();
    credentialsProvider.addProxyCredentials("username", "password");

    final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net");
    assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText());

    webClient.closeAllWindows();
}

The WebClient class is single threaded, so every thread that deals with a web page will need its own WebClient instance.

Unless you need to process Javascript or CSS, you can also disable these when you create the client:

WebClient client = new WebClient();
client.setJavaScriptEnabled(false);
client.setCssEnabled(false);
like image 167
Iain Samuel McLean Elder Avatar answered Nov 08 '22 06:11

Iain Samuel McLean Elder


HTTPUnit is meant for testing purposes, I don't think it is best suited to be embedded inside your application.

When you want to consume HTTP resources (like webpages) I'd recommend Apache HTTPClient. But you may find this framework to low level for your use case which is webpage scraping. So I'd recommend an integration framework like Apache Camel for this purpose. For example the following route reads a webpage (using Apache HTTPClient), transforms the HTML to well-formed HTML (using TagSoup) and transforms the result to a XML representation for further processing.

from("http://mycollege.edu/somepage.html).unmarshall().tidyMarkup().to("xslt:mystylesheet.xsl")

You can further process the resulting XML using XPath or transform it to a POJO using JAXB for example.

like image 23
Richard Kettelerij Avatar answered Nov 08 '22 06:11

Richard Kettelerij


HTTPUnit is for unit testing. Unless you mean "testing client", I don't think it's appropriate for creating an application.

I wand to develop http client in Java

You realize, of course, that the Apache HTTP client is not your answer either. You sound like you want to create a first web app.

You'll need servlets and JSPs. Get Apache's Tomcat and learn enough JSP and JSTL to do what you need to do. Don't bother with frameworks, since it's your first.

When you have it running, then try a framework like Spring.

like image 30
duffymo Avatar answered Nov 08 '22 05:11

duffymo