Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract links from a web page using Go lang

I am learning google's Go programming language. Does anyone know the best practice to extract all URLs from a html web page?

Coming from the Java world, there are libraries to do the job, for example jsoup , htmlparser, etc. But for go lang, I guess no available similar library was made yet?

like image 885
Jifeng Zhang Avatar asked Jun 18 '12 10:06

Jifeng Zhang


People also ask

Is go good for web scraping?

Implementing Web Scraping with GoThe support for concurrency has made Go a fast, powerful language, and because the language is easy to get started with, you can build your web scraper with only a few lines of code. For creating web scrapers with Go, two libraries are very popular: goquery. Colly.


2 Answers

If you know jQuery, you'll love GoQuery.

Honestly, it's the easiest, most powerful HTML utility I've found in Go, and it's based off of the html package in the go.net repository. (Okay, so it's higher-level than just a parser as it doesn't expose raw HTML tokens and the like, but if you want to actually get anything done with an HTML document, this package will help.)

like image 73
Matt Avatar answered Sep 23 '22 11:09

Matt


Go's standard package for HTML parsing is still a work in progress and is not part of the current release. A third party package you might try though is go-html-transform. It is being actively maintained.

like image 38
Sonia Avatar answered Sep 19 '22 11:09

Sonia