I am familier with java programming language I like to extract the data from a website and store it to my database running on my machine.Is that possible in java.If so which API I should use. For example the are number of schools listed on a website How can I extract that data and store it to my database using java.
What you're referring to is commonly called 'screenscraping'. There are a variety of ways to do this in Java, however, I prefer HtmlUnit. While it was designed as a way to test web functionality, you can use it to hit a remote webpage, and parse it out.
I would recommend using a good error handling html parser like Tagsoup to extract from the HTML exactly what you're looking for.
You definitely need a good parser like NekoHTML.
Here's an example of using NekoHTML, albeit using Groovy (a Java-based scripting language) rather than Java itself:
http://www.keplarllp.com/blog/2010/01/better-competitive-intelligence-through-scraping-with-groovy
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With