Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrape data from HTML pages using Java, output to database [closed]

Tags:

java

scraper

I need to know how to create a scraper (in Java) to gather data from HTML pages and output to a database...do not have a clue where to start so any information you can give me on this would be great. Also, you can't be too basic or simple here...thanks :)

like image 422
Tanith Avatar asked Feb 14 '26 18:02

Tanith


1 Answers

First you need to get familiar with a HTML DOM parser in Java like JTidy. This will help you to extract the stuff you want from a HTML file. Once you have the essential stuff, you can use JDBC to put in the database.

It might be tempting to use regular expression for this job. But don't. HTML is not a regular language so regex are not the way to go.

like image 85
codaddict Avatar answered Feb 16 '26 11:02

codaddict



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!