I tried the Go Tour exercise #71
If it is run like go run 71_hang.go ok
, it works fine.
However, if you use go run 71_hang.go nogood
, it will run forever.
The only difference is the extra fmt.Print("")
in the default
in the select
statement.
I'm not sure, but I suspect some sort of infinite loop and race-condition? And here is my solution.
Note: It's not deadlock as Go didn't throw: all goroutines are asleep - deadlock!
package main
import (
"fmt"
"os"
)
type Fetcher interface {
// Fetch returns the body of URL and
// a slice of URLs found on that page.
Fetch(url string) (body string, urls []string, err error)
}
func crawl(todo Todo, fetcher Fetcher,
todoList chan Todo, done chan bool) {
body, urls, err := fetcher.Fetch(todo.url)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("found: %s %q\n", todo.url, body)
for _, u := range urls {
todoList <- Todo{u, todo.depth - 1}
}
}
done <- true
return
}
type Todo struct {
url string
depth int
}
// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher) {
visited := make(map[string]bool)
doneCrawling := make(chan bool, 100)
toDoList := make(chan Todo, 100)
toDoList <- Todo{url, depth}
crawling := 0
for {
select {
case todo := <-toDoList:
if todo.depth > 0 && !visited[todo.url] {
crawling++
visited[todo.url] = true
go crawl(todo, fetcher, toDoList, doneCrawling)
}
case <-doneCrawling:
crawling--
default:
if os.Args[1]=="ok" { // *
fmt.Print("")
}
if crawling == 0 {
goto END
}
}
}
END:
return
}
func main() {
Crawl("http://golang.org/", 4, fetcher)
}
// fakeFetcher is Fetcher that returns canned results.
type fakeFetcher map[string]*fakeResult
type fakeResult struct {
body string
urls []string
}
func (f *fakeFetcher) Fetch(url string) (string, []string, error) {
if res, ok := (*f)[url]; ok {
return res.body, res.urls, nil
}
return "", nil, fmt.Errorf("not found: %s", url)
}
// fetcher is a populated fakeFetcher.
var fetcher = &fakeFetcher{
"http://golang.org/": &fakeResult{
"The Go Programming Language",
[]string{
"http://golang.org/pkg/",
"http://golang.org/cmd/",
},
},
"http://golang.org/pkg/": &fakeResult{
"Packages",
[]string{
"http://golang.org/",
"http://golang.org/cmd/",
"http://golang.org/pkg/fmt/",
"http://golang.org/pkg/os/",
},
},
"http://golang.org/pkg/fmt/": &fakeResult{
"Package fmt",
[]string{
"http://golang.org/",
"http://golang.org/pkg/",
},
},
"http://golang.org/pkg/os/": &fakeResult{
"Package os",
[]string{
"http://golang.org/",
"http://golang.org/pkg/",
},
},
}
Putting a default
statement in your select
changes the way select works. Without a default statement select will block waiting for any messages on the channels. With a default statement select will run the default statement every time there is nothing to read from the channels. In your code I think this makes an infinite loop. Putting the fmt.Print
statement in is allowing the scheduler to schedule other goroutines.
If you change your code like this then it works properly, using select in a non blocking way which allows the other goroutines to run properly.
for {
select {
case todo := <-toDoList:
if todo.depth > 0 && !visited[todo.url] {
crawling++
visited[todo.url] = true
go crawl(todo, fetcher, toDoList, doneCrawling)
}
case <-doneCrawling:
crawling--
}
if crawling == 0 {
break
}
}
You can make your original code work if you use GOMAXPROCS=2 which is another hint that the scheduler is busy in an infinite loop.
Note that goroutines are co-operatively scheduled. What I don't fully understand about your problem is that select
is a point where the goroutine should yield - I hope someone else can explain why it isn't in your example.
You have 100% CPU load because almost all times the default case will be executed, resulting effectively in an infinite loop because it's executed over and over again. In this situation the Go scheduler does not hand control to another goroutine, by design. So any other goroutine will never have the opportunity to set crawling != 0
and you have your infinite loop.
In my opinion you should remove the default case and instead create another channel if you want to play with the select statement.
Otherwise the runtime package helps you to go the dirty way:
runtime.GOMAXPROCS(2)
will work (or export GOMAXPROCS=2), this way you will have more than one OS thread of executionruntime.Gosched()
inside Crawl from time to time. Eventhough CPU load is 100%, this will explicitely pass control to another Goroutine.Edit: Yes, and the reason why fmt.Printf makes a difference: because it explicitely passes control to some syscall stuff... ;)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With