I'm going through the Go Tour and I feel like I have a pretty good understanding of the language except for concurrency.
slide 10 is an exercise that asks the reader to parallelize a web crawler (and to make it not cover repeats but I haven't gotten there yet.)
Here is what I have so far:
func Crawl(url string, depth int, fetcher Fetcher, ch chan string) { if depth <= 0 { return } body, urls, err := fetcher.Fetch(url) if err != nil { ch <- fmt.Sprintln(err) return } ch <- fmt.Sprintf("found: %s %q\n", url, body) for _, u := range urls { go Crawl(u, depth-1, fetcher, ch) } } func main() { ch := make(chan string, 100) go Crawl("http://golang.org/", 4, fetcher, ch) for i := range ch { fmt.Println(i) } }
My question is, where do I put the close(ch)
call.
If I put a defer close(ch)
somewhere in the Crawl
method, then the program ends up writing to a closed channel from one of the spawned goroutines, because the call to Crawl
will return before the spawned goroutines do.
If I omit the call to close(ch)
, as I demonstrate it, the program deadlocks in the main function ranging the channel because the channel is never closed when all goroutines has returned.
A Tour of Go is an introduction to the Go programming language. Visit https://tour.golang.org to start the tour.
A soldier who has a family will experience a tour of duty that lasts 36 months, if accompanied by the family. If the soldier does not have a family, it will be for 12 months. For single soldiers, the tour of duty is now 36 months.
A look at the Parallelization section of Effective Go leads to ideas for the solution. Essentually you have to close the channel on each return route of the function. Actually this is a nice use case of the defer statement:
func Crawl(url string, depth int, fetcher Fetcher, ret chan string) { defer close(ret) if depth <= 0 { return } body, urls, err := fetcher.Fetch(url) if err != nil { ret <- err.Error() return } ret <- fmt.Sprintf("found: %s %q", url, body) result := make([]chan string, len(urls)) for i, u := range urls { result[i] = make(chan string) go Crawl(u, depth-1, fetcher, result[i]) } for i := range result { for s := range result[i] { ret <- s } } return } func main() { result := make(chan string) go Crawl("http://golang.org/", 4, fetcher, result) for s := range result { fmt.Println(s) } }
The essential difference to your code is that every instance of Crawl gets its own return channel and the caller function collects the results in its return channel.
I went with a completely different direction with this one. I might have been mislead by the tip about using a map.
// SafeUrlMap is safe to use concurrently. type SafeUrlMap struct { v map[string]string mux sync.Mutex } func (c *SafeUrlMap) Set(key string, body string) { c.mux.Lock() // Lock so only one goroutine at a time can access the map c.v. c.v[key] = body c.mux.Unlock() } // Value returns mapped value for the given key. func (c *SafeUrlMap) Value(key string) (string, bool) { c.mux.Lock() // Lock so only one goroutine at a time can access the map c.v. defer c.mux.Unlock() val, ok := c.v[key] return val, ok } // Crawl uses fetcher to recursively crawl // pages starting with url, to a maximum of depth. func Crawl(url string, depth int, fetcher Fetcher, urlMap SafeUrlMap) { defer wg.Done() urlMap.Set(url, body) if depth <= 0 { return } body, urls, err := fetcher.Fetch(url) if err != nil { fmt.Println(err) return } for _, u := range urls { if _, ok := urlMap.Value(u); !ok { wg.Add(1) go Crawl(u, depth-1, fetcher, urlMap) } } return } var wg sync.WaitGroup func main() { urlMap := SafeUrlMap{v: make(map[string]string)} wg.Add(1) go Crawl("http://golang.org/", 4, fetcher, urlMap) wg.Wait() for url := range urlMap.v { body, _ := urlMap.Value(url) fmt.Printf("found: %s %q\n", url, body) } }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With