I am loading web pages using simple thread pool, while dynamically loading urls from file. But this small program slowly allocate as much memory as my server has, until omm killer stops it. It looks like resp.Body.Close() doesn't free memory for body text (memory size ~ downloaded pages * avg page size). How can I force golang to free memory allocated for body html text?
package main
import (
"bufio"
"fmt"
"io/ioutil"
"net/http"
"os"
"strings"
"sync"
)
func worker(linkChan chan string, wg *sync.WaitGroup) {
defer wg.Done()
for url := range linkChan {
// Getting body text
resp, err := http.Get(url)
if err != nil {
fmt.Printf("Fail url: %s\n", url)
continue
}
body, err := ioutil.ReadAll(resp.Body)
resp.Body.Close()
if err != nil {
fmt.Printf("Fail url: %s\n", url)
continue
}
// Test page body
has_rem_code := strings.Contains(string(body), "googleadservices.com/pagead/conversion.js")
fmt.Printf("Done url: %s\t%t\n", url, has_rem_code)
}
}
func main() {
// Creating worker pool
lCh := make(chan string, 30)
wg := new(sync.WaitGroup)
for i := 0; i < 30; i++ {
wg.Add(1)
go worker(lCh, wg)
}
// Opening file with urls
file, err := os.Open("./tmp/new.csv")
defer file.Close()
if err != nil {
panic(err)
}
reader := bufio.NewReader(file)
// Processing urls
for href, _, err := reader.ReadLine(); err == nil; href, _, err = reader.ReadLine() {
lCh <- string(href)
}
close(lCh)
wg.Wait()
}
Here is some output from pprof tool:
flat flat% sum% cum cum%
34.63MB 29.39% 29.39% 34.63MB 29.39% bufio.NewReaderSize
30MB 25.46% 54.84% 30MB 25.46% net/http.(*Transport).getIdleConnCh
23.09MB 19.59% 74.44% 23.09MB 19.59% bufio.NewWriter
11.63MB 9.87% 84.30% 11.63MB 9.87% net/http.(*Transport).putIdleConn
6.50MB 5.52% 89.82% 6.50MB 5.52% main.main
Looks like this issue, but it's fixed 2 years ago.
Memory leak in Golang? We already know, that Golang is a blazing fast language with one of the smoothest learning curve ever. Despite being easy to learn, there are issues which are not trivial to debug in the language and I feel like memory issues are one of them.
The Go runtime claims it returned all but 94MB to the OS (the scvg2 lines). Maybe my hunch earlier was correct, or maybe the "Memory" reported is virtual, not physical.
But reading source code, I can see that runtime.GC () does not release memory to OS, it keeps it handy until scavenger decides that it should be released to OS. It appears that the scavenger doesn't notice that it has an extra 1686 and 1625 MB of memory until 8-10 minutes into program execution.
If there’s no memory left to allocate you are going to experience things like: If your application dies, it should be because of the number of requests you receive, and not because of an unfortunate memory leak. You could carry the load from the hardware side for a while, but these kinds of issues need immediate fixes.
Found the answer in this thread on golang-nuts. http.Transport
saves connections for future reusing in case of request to same host, causing memory bloating in my case (hundreds thousands of different hosts). But disabling KeepAlives totally solves that problem.
Working code:
func worker(linkChan chan string, wg *sync.WaitGroup) {
defer wg.Done()
var transport http.RoundTripper = &http.Transport{
DisableKeepAlives: true,
}
c := &http.Client{Transport: transport}
for url := range linkChan {
// Getting body text
resp, err := c.Get(url)
if err != nil {
fmt.Printf("Fail url: %s\n", url)
continue
}
body, err := ioutil.ReadAll(resp.Body)
resp.Body.Close()
if err != nil {
fmt.Printf("Fail url: %s\n", url)
continue
}
// Test page body
has_rem_code := strings.Contains(string(body), "googleadservices.com/pagead/conversion.js")
fmt.Printf("Done url: %s\t%t\n", url, has_rem_code)
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With