I have a table in html that I would like to parse. Something like the one in the following
http://sprunge.us/IJUC
However, I'm not sure of a good way to parse out the information. I've seen a couple of html parsers, but those seem to require that everything has a special tag for you to parse it like info to grab; however, the majority of my info is within <td></td>
Does anyone have a suggestion for parsing this information out?
The goquery module lets you select and extract content from a website using a familiar syntax borrowed from CSS. Web scraping, also known as web data extraction, is an automated method of extracting data or content from web pages. Web scrapers automate data extraction without human interference.
Shameless plug: My goquery library. It's the jQuery syntax brought to Go (requires Go's experimental html package, see instructions in the README of the library).
So you can do things like that (assuming your HTML document is loaded in doc, a *goquery.Document
):
doc.Find("td").Each(func (i int, s *goquery.Selection) {
fmt.Printf("Content of cell %d: %s\n", i, s.Text())
})
Edit: Change doc.Root.Find
to doc.Find
in the example since a goquery Document is now a Selection too (new in v0.2/master branch)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With