Tour of Go exercise #10: Crawler

Tags:

I'm going through the Go Tour and I feel like I have a pretty good understanding of the language except for concurrency.

slide 10 is an exercise that asks the reader to parallelize a web crawler (and to make it not cover repeats but I haven't gotten there yet.)

Here is what I have so far:

func Crawl(url string, depth int, fetcher Fetcher, ch chan string) {     if depth <= 0 {         return     }      body, urls, err := fetcher.Fetch(url)     if err != nil {         ch <- fmt.Sprintln(err)         return     }      ch <- fmt.Sprintf("found: %s %q\n", url, body)     for _, u := range urls {         go Crawl(u, depth-1, fetcher, ch)     } }  func main() {     ch := make(chan string, 100)     go Crawl("http://golang.org/", 4, fetcher, ch)      for i := range ch {         fmt.Println(i)     } }

My question is, where do I put the close(ch) call.

If I put a defer close(ch) somewhere in the Crawl method, then the program ends up writing to a closed channel from one of the spawned goroutines, because the call to Crawl will return before the spawned goroutines do.

If I omit the call to close(ch), as I demonstrate it, the program deadlocks in the main function ranging the channel because the channel is never closed when all goroutines has returned.

412

asked Nov 04 '12 09:11

David Mason

2 Answers

A look at the Parallelization section of Effective Go leads to ideas for the solution. Essentually you have to close the channel on each return route of the function. Actually this is a nice use case of the defer statement:

func Crawl(url string, depth int, fetcher Fetcher, ret chan string) {     defer close(ret)     if depth <= 0 {         return     }      body, urls, err := fetcher.Fetch(url)     if err != nil {         ret <- err.Error()         return     }      ret <- fmt.Sprintf("found: %s %q", url, body)      result := make([]chan string, len(urls))     for i, u := range urls {         result[i] = make(chan string)         go Crawl(u, depth-1, fetcher, result[i])     }      for i := range result {         for s := range result[i] {             ret <- s         }     }      return }  func main() {     result := make(chan string)     go Crawl("http://golang.org/", 4, fetcher, result)      for s := range result {         fmt.Println(s)     } }

The essential difference to your code is that every instance of Crawl gets its own return channel and the caller function collects the results in its return channel.

198

answered Sep 29 '22 12:09

fasmat

I went with a completely different direction with this one. I might have been mislead by the tip about using a map.

// SafeUrlMap is safe to use concurrently. type SafeUrlMap struct {     v   map[string]string     mux sync.Mutex }  func (c *SafeUrlMap) Set(key string, body string) {     c.mux.Lock()     // Lock so only one goroutine at a time can access the map c.v.     c.v[key] = body     c.mux.Unlock() }  // Value returns mapped value for the given key. func (c *SafeUrlMap) Value(key string) (string, bool) {     c.mux.Lock()     // Lock so only one goroutine at a time can access the map c.v.     defer c.mux.Unlock()     val, ok := c.v[key]     return val, ok }  // Crawl uses fetcher to recursively crawl // pages starting with url, to a maximum of depth. func Crawl(url string, depth int, fetcher Fetcher, urlMap SafeUrlMap) {     defer wg.Done()     urlMap.Set(url, body)      if depth <= 0 {         return     }      body, urls, err := fetcher.Fetch(url)     if err != nil {         fmt.Println(err)         return     }      for _, u := range urls {         if _, ok := urlMap.Value(u); !ok {             wg.Add(1)             go Crawl(u, depth-1, fetcher, urlMap)         }     }      return }  var wg sync.WaitGroup  func main() {     urlMap := SafeUrlMap{v: make(map[string]string)}      wg.Add(1)     go Crawl("http://golang.org/", 4, fetcher, urlMap)     wg.Wait()      for url := range urlMap.v {         body, _ := urlMap.Value(url)         fmt.Printf("found: %s %q\n", url, body)     } }

answered Sep 29 '22 14:09

whossname

Related questions
                            
                                Why does github show me "Your recently pushed branches"?
                            
                                Manipulate project version property to remove SNAPSHOT?
                            
                                XPath with namespace in Java
                            
                                SQL Server : error converting data type varchar to numeric
                            
                                Android OpenGL demo "No config chosen"
                            
                                Unable to call App_Code class from a code-behind
                            
                                Locale.SPANISH where is it?
                            
                                Is there any point to temporarily making a pointer NULL?
                            
                                Visual Studio keep asking to install latest silverlight developer runtime, when it is already installed
                            
                                Where are hadoop jar files in hadoop 2? [closed]
                            
                                Trying to figure out Ruby on Rails remote: true callbacks
                            
                                Get file path of image on Android

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With