Can anybody tell me how to parse HTML content as XML using TagSoup within Android? I am looking for functional code examples if possible.
XMLReader xmlReader = XMLReaderFactory.createXMLReader ("org.ccil.cowan.tagsoup.Parser");
ContentHandler handler = new DefaultHandler () {
public void startElement (String uri, String localName, String qName, Attributes attributes) throws SAXException
{
// ...
}
};
xmlReader.setContentHandler (handler);
xmlReader.parse (new InputSource (input));
Below is code which should provide you with a means of parsing the web page via the Document
produced by TagSoup.
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet("http://streak.espn.go.com/en/?date=20120824");
HttpResponse response = client.execute(request);
// Check if server response is valid
StatusLine status = response.getStatusLine();
if (status.getStatusCode() != 200) {
throw new IOException("Invalid response from server: " + status.toString());
}
// Pull content stream from response
HttpEntity entity = response.getEntity();
InputStream inputStream = entity.getContent();
try
{
XMLReader parser = XMLReaderFactory.createXMLReader("org.ccil.cowan.tagsoup.Parser");
// Use the TagSoup parser to build an XOM document from HTML
Document doc = new Builder(parser).build(builder.toString());
// Parse the document as needed
Node node = doc.query("...");
}
catch(IOException e)
{ ... }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With