Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does my code work correctly when I run wg.Wait() inside a goroutine?

I have a list of urls that I am scraping. What I want to do is store all of the successfully scraped page data into a channel, and when I am done, dump it into a slice. I don't know how many successful fetches I will get, so I cannot specify a fixed length. I expected the code to reach wg.Wait() and then wait until all the wg.Done() methods are called, but I never reached the close(queue) statement. Looking for a similar answer, I came across this SO answer

https://stackoverflow.com/a/31573574/5721702

where the author does something similar:

ports := make(chan string)
toScan := make(chan int)
var wg sync.WaitGroup

// make 100 workers for dialing
for i := 0; i < 100; i++ {
    wg.Add(1)
    go func() {
        defer wg.Done()
        for p := range toScan {
            ports <- worker(*host, p)
        }
    }()
}

// close our receiving ports channel once all workers are done
go func() {
    wg.Wait()
    close(ports)
}()

As soon as I wrapped my wg.Wait() inside the goroutine, close(queue) was reached:

urls := getListOfURLS()
activities := make([]Activity, 0, limit)
queue := make(chan Activity)
for i, activityURL := range urls {
    wg.Add(1)
    go func(i int, url string) {
        defer wg.Done()
        activity, err := extractDetail(url)
        if err != nil {
            log.Println(err)
            return
        }
        queue <- activity
    }(i, activityURL)
}
    // calling it like this without the goroutine causes the execution to hang
// wg.Wait() 
// close(queue)

    // calling it like this successfully waits
go func() {
    wg.Wait()
    close(queue)
}()
for a := range queue {
    // block channel until valid url is added to queue
    // once all are added, close it
    activities = append(activities, a)
}

Why does the code not reach the close if I don't use a goroutine for wg.Wait()? I would think that the all of the defer wg.Done() statements are called so eventually it would clear up, because it gets to the wg.Wait(). Does it have to do with receiving values in my channel?

like image 938
Huy-Anh Hoang Avatar asked Oct 04 '17 08:10

Huy-Anh Hoang


People also ask

How do you wait for a Goroutine to finish?

In order to be able to run the goroutines until the finish, we can either make use of a channel that will act as a blocker or we can use waitGroups that Go's sync package provides us with.

How many Goroutines can run at once?

At any time, at most only one thread is allowed to run per core. But scheduler can create more threads if required, but that rarely happens. If your program doesn't start any additional goroutines, it will naturally run in only one thread no matter how many cores you allow it to use.

What is a Goroutine?

A goroutine is a function that executes simultaneously with other goroutines in a program and are lightweight threads managed by Go. A goroutine takes about 2kB of stack space to initialize.


2 Answers

You need to wait for goroutines to finish in a separate thread because queue needs to be read from. When you do the following:

queue := make(chan Activity)
for i, activityURL := range urls {
    wg.Add(1)
    go func(i int, url string) {
        defer wg.Done()
        activity, err := extractDetail(url)
        if err != nil {
            log.Println(err)
            return
        }
        queue <- activity // nothing is reading data from queue.
    }(i, activityURL)
}

wg.Wait() 
close(queue)

for a := range queue {
    activities = append(activities, a)
}

Each goroutine blocks at queue <- activity since queue is unbuffered and nothing is reading data from it. This is because the range loop on queue is in the main thread after wg.Wait.

wg.Wait will only unblock once all the goroutine return. But as mentioned, all the goroutines are blocked at channel send.

When you use a separate goroutine to wait, code execution actually reaches the range loop on queue.

// wg.Wait does not block the main thread.
go func() {
    wg.Wait()
    close(queue)
}()

This results in the goroutines unblocking at the queue <- activity statement (main thread starts reading off queue) and running until completion. Which in turn calls each individual wg.Done.

Once the waiting goroutine get past wg.Wait, queue is closed and the main thread exits the range loop on it.

like image 68
abhink Avatar answered Oct 17 '22 18:10

abhink


queue channel is unbuffered so every goroutine trying to write to it gets blocked because reader process is not yet started. So no goroutinte can write and they all hang - as a result wg.Wait waits forever. Try to launch reader in a separate goroutine:

go func() {
    for a := range queue {
        // block channel until valid url is added to queue
        // once all are added, close it
       activities = append(activities, a)
    }
}()

and then start waiter:

wg.Wait() 
close(queue)

This way you can not to accumulate all the data in channel and overload it, but get data as it comes and put to target slice.

like image 20
Eugene Lisitsky Avatar answered Oct 17 '22 17:10

Eugene Lisitsky