What is the best way to scrape the below HTML from a web page? I want to pull out Apple, Orange and Grape and put them into a dropdown menu in my Android app. Should I use Jsoup for this, and if so, what would be the best way to do it? Should I use Regex instead?
<select name="fruit" id="fruit" >
<option value="APPLE">Apple</option>
<option value="ORANGE">Orange</option>
<option value="GRAPE">Grape</option>
</select>
Depends, but I'd go with an XML/HTML parser. Don't use regex.
Example with jsoup:
Document doc = Jsoup.connect(someUrl).get();
Elements options = doc.select("select#fruit option");
More on jsoup selector syntax.
I would go with either the built-in DOM parser or SAX parser. If you're going to be parsing a large document, SAX is faster. If the document is small, then there's not much difference. More on SAX vs DOM.
For HTML parsing you can use jsoup. The usage is very easy and the API is great.
http://jsoup.org/
For me it worked great!
EDIT: too slow :D skyuzo's post is great :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With