I am learning google's Go programming language. Does anyone know the best practice to extract all URLs from a html web page?
Coming from the Java world, there are libraries to do the job, for example jsoup , htmlparser, etc. But for go lang, I guess no available similar library was made yet?
Implementing Web Scraping with GoThe support for concurrency has made Go a fast, powerful language, and because the language is easy to get started with, you can build your web scraper with only a few lines of code. For creating web scrapers with Go, two libraries are very popular: goquery. Colly.
If you know jQuery, you'll love GoQuery.
Honestly, it's the easiest, most powerful HTML utility I've found in Go, and it's based off of the html package in the go.net repository. (Okay, so it's higher-level than just a parser as it doesn't expose raw HTML tokens and the like, but if you want to actually get anything done with an HTML document, this package will help.)
Go's standard package for HTML parsing is still a work in progress and is not part of the current release. A third party package you might try though is go-html-transform. It is being actively maintained.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With