Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should we do nested goroutines?

I'm trying to build a parser for a large number of files, and I can't find information about what might possibly be called "nested goroutines" (maybe this is not the right name ?).

Given a lot of files, each of them having a lot of lines. Should I do:

for file in folder:
    go do1

def do1:
    for line in file:
        go do2

def do2:
    do_something

Or should I use only "one level" of goroutines, and do the following:

for file in folder:
    for line in file:
        go do_something

My question target primarily performance issues.

Thanks for reaching that sentence !

like image 689
mazieres Avatar asked Feb 14 '14 20:02

mazieres


People also ask

Can you have nested Goroutines?

Goroutines can be nested to an arbitrary depth and the output will be even more random in many cases.

When should we use Goroutines?

Goroutines are useful when you want to do multiple things simultaneously. For example, if you have ten things you want to do at the same time, you can do each one on a separate goroutine, and wait for all of them to finish.

How do you run multiple Goroutines?

First, you will create a program that uses goroutines to run multiple functions at once. Then you will add channels to that program to communicate between the running goroutines. Finally, you'll add more goroutines to the program to simulate a program running with multiple worker goroutines.

How many Goroutines can you start?

In the case of goroutines, since stack size can grow dynamically, you can spawn 1000 goroutines without a problem. As a goroutine starts with 8KB (2KB since Go 1.4) of stack space, most of them generally don't grow bigger than that.


1 Answers

The answer depends on how processor-intensive the operation on each line is.

If the line operation is short-lived, definitely don't bother to spawn a goroutine for each line.

If it's expensive (think ~5 secs or more), proceed with caution. You may run out of memory. As of Go 1.4, spawning a goroutine allocates a 2048 byte stack. For 2 million lines, you could allocate over 2GB of RAM for the goroutine stacks alone. Consider whether it's worth allocating this memory.

In short, you will probably get the best results with the following setup:

for file in folder:
    go process_file(file)

If the number of files exceeds the number of CPUs, you're likely to have enough concurrency to mask the disk I/O latency involved in reading the files from disk.

like image 190
Brian Holder-Chow Avatar answered Nov 06 '22 14:11

Brian Holder-Chow