Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse html content in xml using tagsoup in android

Can anybody tell me how to parse HTML content as XML using TagSoup within Android? I am looking for functional code examples if possible.

like image 589
user386430 Avatar asked Sep 13 '11 06:09

user386430


2 Answers

XMLReader xmlReader = XMLReaderFactory.createXMLReader ("org.ccil.cowan.tagsoup.Parser");
ContentHandler handler = new DefaultHandler () {
  public void startElement (String uri, String localName, String qName, Attributes attributes) throws SAXException
  {
    // ...
  }
};
xmlReader.setContentHandler (handler);
xmlReader.parse (new InputSource (input));
like image 95
Patrick Avatar answered Oct 31 '22 19:10

Patrick


Below is code which should provide you with a means of parsing the web page via the Document produced by TagSoup.

    HttpClient client = new DefaultHttpClient();
    HttpGet request = new HttpGet("http://streak.espn.go.com/en/?date=20120824");
    HttpResponse response = client.execute(request);

    // Check if server response is valid
    StatusLine status = response.getStatusLine();
    if (status.getStatusCode() != 200) {
        throw new IOException("Invalid response from server: " + status.toString());
    }

    // Pull content stream from response
    HttpEntity entity = response.getEntity();
    InputStream inputStream = entity.getContent();

    try
    {
        XMLReader parser = XMLReaderFactory.createXMLReader("org.ccil.cowan.tagsoup.Parser");

        // Use the TagSoup parser to build an XOM document from HTML
        Document doc = new Builder(parser).build(builder.toString());

        // Parse the document as needed
        Node node = doc.query("...");
    }
    catch(IOException e)
    { ... }
like image 3
Aaron McIver Avatar answered Oct 31 '22 19:10

Aaron McIver