Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do I get "net/http: request canceled while waiting for connection" when I try to fetch some images with "net/http"

I'm writing a web crawler in Go language to collect images on the Internet. My crawler works most of the time, but it sometimes fails to fetch images somehow.

Here's my snippet:

package main

import (
    "fmt"
    "net/http"
    "time"
)

func main() {
    var client http.Client
    var resp *http.Response

    // var imageUrl = "https://i.stack.imgur.com/tKsDb.png"  // It works well
    var imageUrl = "https://precious.jp/mwimgs/b/1/-/img_b1ec6cf54ff3a4260fb77d3d3de918a5275780.jpg"  // It fails

    req, _ := http.NewRequest("GET", imageUrl, nil)
    req.Header.Add("User-Agent", "My Test")

    client.Timeout = 3 * time.Second
    resp, err := client.Do(req)
    if err != nil {
        fmt.Println(err.Error())  // Fails here
        return
    }
    defer resp.Body.Close()

    if resp.StatusCode != http.StatusOK {
        fmt.Printf("Failure: %d\n", resp.StatusCode)
    } else {
        fmt.Printf("Success: %d\n", resp.StatusCode)
    }

    fmt.Println("Done")
}

My snippet above works for most of the URLs (e.g. "https://i.stack.imgur.com/tKsDb.png"), but it doesn't work if it tries to fetch URLs such as "https://precious.jp/mwimgs/b/1/-/img_b1ec6cf54ff3a4260fb77d3d3de918a5275780.jpg". Error message given by calling err.Error() is:

Get https://precious.jp/mwimgs/b/1/-/img_b1ec6cf54ff3a4260fb77d3d3de918a5275780.jpg: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"

My Go version is "go1.9.3 darwin/amd64", and I can get the image with my Google Chrome and also with curl command, so I don't think I'm blocked by my IP address. Besides that, I've changed the User-Agent to be like real browser but still not luck.

What's wrong with my code? Or is the administrator of precious.jp doing some magic to block my access?

like image 841
Sa Oh Avatar asked Jan 30 '18 08:01

Sa Oh


1 Answers

Since you're using https, you need to create http.Client with custom transport and configure TLS (see http.Transport), e.g.

package main

import (
    "crypto/tls"
    "fmt"
    "net/http"
    "time"
)

func main() {
    //---------------------- Modification ----------------------
    //Configure TLS, etc.
    tr := &http.Transport{
        TLSClientConfig: &tls.Config{
            InsecureSkipVerify: true,
        },
    }
    client := &http.Client{
        Transport: tr,
        Timeout:   3 * time.Second,
    }
    //---------------------- End of Modification ----------------

    // var imageUrl = "https://i.stack.imgur.com/tKsDb.png"  // It works well
    var imageUrl = "https://precious.jp/mwimgs/b/1/-/img_b1ec6cf54ff3a4260fb77d3d3de918a5275780.jpg" // It fails

    req, _ := http.NewRequest("GET", imageUrl, nil)
    req.Header.Add("User-Agent", "My Test")

    resp, err := client.Do(req)
    if err != nil {
        fmt.Println(err.Error()) // Fails here
        return
    }
    defer resp.Body.Close()

    if resp.StatusCode != http.StatusOK {
        fmt.Printf("Failure: %d\n", resp.StatusCode)
    } else {
        fmt.Printf("Success: %d\n", resp.StatusCode)
    }

    fmt.Println("Done")
}
like image 125
putu Avatar answered Nov 02 '22 04:11

putu