I'm writing a web crawler in Go language to collect images on the Internet. My crawler works most of the time, but it sometimes fails to fetch images somehow.
Here's my snippet:
package main
import (
"fmt"
"net/http"
"time"
)
func main() {
var client http.Client
var resp *http.Response
// var imageUrl = "https://i.stack.imgur.com/tKsDb.png" // It works well
var imageUrl = "https://precious.jp/mwimgs/b/1/-/img_b1ec6cf54ff3a4260fb77d3d3de918a5275780.jpg" // It fails
req, _ := http.NewRequest("GET", imageUrl, nil)
req.Header.Add("User-Agent", "My Test")
client.Timeout = 3 * time.Second
resp, err := client.Do(req)
if err != nil {
fmt.Println(err.Error()) // Fails here
return
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
fmt.Printf("Failure: %d\n", resp.StatusCode)
} else {
fmt.Printf("Success: %d\n", resp.StatusCode)
}
fmt.Println("Done")
}
My snippet above works for most of the URLs (e.g. "https://i.stack.imgur.com/tKsDb.png"), but it doesn't work if it tries to fetch URLs such as "https://precious.jp/mwimgs/b/1/-/img_b1ec6cf54ff3a4260fb77d3d3de918a5275780.jpg". Error message given by calling err.Error()
is:
Get https://precious.jp/mwimgs/b/1/-/img_b1ec6cf54ff3a4260fb77d3d3de918a5275780.jpg: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
My Go version is "go1.9.3 darwin/amd64", and I can get the image with my Google Chrome and also with curl
command, so I don't think I'm blocked by my IP address. Besides that, I've changed the User-Agent to be like real browser but still not luck.
What's wrong with my code? Or is the administrator of precious.jp
doing some magic to block my access?
Since you're using https
, you need to create http.Client
with custom transport and configure TLS
(see http.Transport), e.g.
package main
import (
"crypto/tls"
"fmt"
"net/http"
"time"
)
func main() {
//---------------------- Modification ----------------------
//Configure TLS, etc.
tr := &http.Transport{
TLSClientConfig: &tls.Config{
InsecureSkipVerify: true,
},
}
client := &http.Client{
Transport: tr,
Timeout: 3 * time.Second,
}
//---------------------- End of Modification ----------------
// var imageUrl = "https://i.stack.imgur.com/tKsDb.png" // It works well
var imageUrl = "https://precious.jp/mwimgs/b/1/-/img_b1ec6cf54ff3a4260fb77d3d3de918a5275780.jpg" // It fails
req, _ := http.NewRequest("GET", imageUrl, nil)
req.Header.Add("User-Agent", "My Test")
resp, err := client.Do(req)
if err != nil {
fmt.Println(err.Error()) // Fails here
return
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
fmt.Printf("Failure: %d\n", resp.StatusCode)
} else {
fmt.Printf("Success: %d\n", resp.StatusCode)
}
fmt.Println("Done")
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With