Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can you parse HTML in android?

I am making an application for android, and an element of the functionality of the application is to return results from an online search of a library's catalogue. The application needs to display the results of the search, which is carried out by way of a custom HTML form, in a manner in keeping with the rest of the application. Ie, the results of the search need to be parsed and the useful elements displayed. I was just wondering if/how this could be achieved in android?

like image 498
MCHALL Avatar asked Aug 18 '11 21:08

MCHALL


People also ask

Can we use HTML in Android?

In Android, we usually need HTML files for displaying the content in WebView. If the developer wants to add any website page or want to create a local webpage for the app then it could be done using HTML files.

How do you parse HTML?

HTML parsing involves tokenization and tree construction. HTML tokens include start and end tags, as well as attribute names and values. If the document is well-formed, parsing it is straightforward and faster. The parser parses tokenized input into the document, building up the document tree.

How do I view HTML code in Android?

Display HTML code using TextViewsetText(Html. fromHtml(descriptionUsingTextView)); In the snippet code above, we using the method Html. fromHtml(String source, int flags).

Does Jsoup work on Android?

jsoup runs on Java 8 and up, Scala, Kotlin, Android, OSGi, Lambda, and Google App Engine.


1 Answers

You would use a Html Parser. One that i use and works VERY well is JSoup This is where you will need to begin with parsing html. Also Apache Jericho is another good one.

You would retrieve the html document by using DOM, and use the JSOUP Select() method to select any tags that you would like to get. Either by tag, id, or class.

Solution

Use the: Jsoup.connect(String url) method:

 Document doc = Jsoup.connect("http://example.com/").get();

This will allow you to connect to the html page by using the url. And store it as the Document doc, Through DOM. And the read from it using the selector() method.

Description

The connect(String url) method creates a new Connection, and get() fetches and parses a HTML file. If an error occurs whilst fetching the URL, it will throw an IOException, which you should handle appropriately.

The Connection interface is designed for method chaining to build specific requests:

 Document doc = Jsoup.connect("http://example.com")

If you read through the documentation on Jsoup you should be able to achieve this.

EDIT: Here is how you would use the selector method

  //Once the Document is retrieved above, use these selector methods to Extract the   data you want by using the tags, id, or css class 

  Elements links = doc.select("a[href]"); // a with href
  Elements pngs = doc.select("img[src$=.png]");
  // img with src ending .png

  Element masthead = doc.select("div.masthead").first();
  // div with class=masthead

  Elements resultLinks = doc.select("h3.r > a"); // direct a after h3

EDIT: Using JSOUP you could use this to get attributes, text,

Document doc = Jsoup.connect("http://example.com")
Element link = doc.select("a").first();

String text = doc.body().text(); // "An example link"
String linkHref = link.attr("href"); // "http://example.com/"
String linkText = link.text(); // "example""

String linkOuterH = link.outerHtml(); 
// "<a href="http://example.com"><b>example</b></a>"
String linkInnerH = link.html(); // "<b>example</b>"
like image 125
yoshi24 Avatar answered Oct 16 '22 19:10

yoshi24