Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tour of Go exercise #10: Crawler

Tags:

I'm going through the Go Tour and I feel like I have a pretty good understanding of the language except for concurrency.

slide 10 is an exercise that asks the reader to parallelize a web crawler (and to make it not cover repeats but I haven't gotten there yet.)

Here is what I have so far:

func Crawl(url string, depth int, fetcher Fetcher, ch chan string) {     if depth <= 0 {         return     }      body, urls, err := fetcher.Fetch(url)     if err != nil {         ch <- fmt.Sprintln(err)         return     }      ch <- fmt.Sprintf("found: %s %q\n", url, body)     for _, u := range urls {         go Crawl(u, depth-1, fetcher, ch)     } }  func main() {     ch := make(chan string, 100)     go Crawl("http://golang.org/", 4, fetcher, ch)      for i := range ch {         fmt.Println(i)     } } 

My question is, where do I put the close(ch) call.

If I put a defer close(ch) somewhere in the Crawl method, then the program ends up writing to a closed channel from one of the spawned goroutines, because the call to Crawl will return before the spawned goroutines do.

If I omit the call to close(ch), as I demonstrate it, the program deadlocks in the main function ranging the channel because the channel is never closed when all goroutines has returned.

like image 412
David Mason Avatar asked Nov 04 '12 09:11

David Mason


People also ask

What is a tour of go?

A Tour of Go is an introduction to the Go programming language. Visit https://tour.golang.org to start the tour.

How long is a tour of go?

A soldier who has a family will experience a tour of duty that lasts 36 months, if accompanied by the family. If the soldier does not have a family, it will be for 12 months. For single soldiers, the tour of duty is now 36 months.


2 Answers

A look at the Parallelization section of Effective Go leads to ideas for the solution. Essentually you have to close the channel on each return route of the function. Actually this is a nice use case of the defer statement:

func Crawl(url string, depth int, fetcher Fetcher, ret chan string) {     defer close(ret)     if depth <= 0 {         return     }      body, urls, err := fetcher.Fetch(url)     if err != nil {         ret <- err.Error()         return     }      ret <- fmt.Sprintf("found: %s %q", url, body)      result := make([]chan string, len(urls))     for i, u := range urls {         result[i] = make(chan string)         go Crawl(u, depth-1, fetcher, result[i])     }      for i := range result {         for s := range result[i] {             ret <- s         }     }      return }  func main() {     result := make(chan string)     go Crawl("http://golang.org/", 4, fetcher, result)      for s := range result {         fmt.Println(s)     } } 

The essential difference to your code is that every instance of Crawl gets its own return channel and the caller function collects the results in its return channel.

like image 198
fasmat Avatar answered Sep 29 '22 12:09

fasmat


I went with a completely different direction with this one. I might have been mislead by the tip about using a map.

// SafeUrlMap is safe to use concurrently. type SafeUrlMap struct {     v   map[string]string     mux sync.Mutex }  func (c *SafeUrlMap) Set(key string, body string) {     c.mux.Lock()     // Lock so only one goroutine at a time can access the map c.v.     c.v[key] = body     c.mux.Unlock() }  // Value returns mapped value for the given key. func (c *SafeUrlMap) Value(key string) (string, bool) {     c.mux.Lock()     // Lock so only one goroutine at a time can access the map c.v.     defer c.mux.Unlock()     val, ok := c.v[key]     return val, ok }  // Crawl uses fetcher to recursively crawl // pages starting with url, to a maximum of depth. func Crawl(url string, depth int, fetcher Fetcher, urlMap SafeUrlMap) {     defer wg.Done()     urlMap.Set(url, body)      if depth <= 0 {         return     }      body, urls, err := fetcher.Fetch(url)     if err != nil {         fmt.Println(err)         return     }      for _, u := range urls {         if _, ok := urlMap.Value(u); !ok {             wg.Add(1)             go Crawl(u, depth-1, fetcher, urlMap)         }     }      return }  var wg sync.WaitGroup  func main() {     urlMap := SafeUrlMap{v: make(map[string]string)}      wg.Add(1)     go Crawl("http://golang.org/", 4, fetcher, urlMap)     wg.Wait()      for url := range urlMap.v {         body, _ := urlMap.Value(url)         fmt.Printf("found: %s %q\n", url, body)     } } 
like image 28
whossname Avatar answered Sep 29 '22 14:09

whossname