I'm trying to build a parser for a large number of files, and I can't find information about what might possibly be called "nested goroutines" (maybe this is not the right name ?).
Given a lot of files, each of them having a lot of lines. Should I do:
for file in folder:
go do1
def do1:
for line in file:
go do2
def do2:
do_something
Or should I use only "one level" of goroutines, and do the following:
for file in folder:
for line in file:
go do_something
My question target primarily performance issues.
Thanks for reaching that sentence !
Goroutines can be nested to an arbitrary depth and the output will be even more random in many cases.
Goroutines are useful when you want to do multiple things simultaneously. For example, if you have ten things you want to do at the same time, you can do each one on a separate goroutine, and wait for all of them to finish.
First, you will create a program that uses goroutines to run multiple functions at once. Then you will add channels to that program to communicate between the running goroutines. Finally, you'll add more goroutines to the program to simulate a program running with multiple worker goroutines.
In the case of goroutines, since stack size can grow dynamically, you can spawn 1000 goroutines without a problem. As a goroutine starts with 8KB (2KB since Go 1.4) of stack space, most of them generally don't grow bigger than that.
The answer depends on how processor-intensive the operation on each line is.
If the line operation is short-lived, definitely don't bother to spawn a goroutine for each line.
If it's expensive (think ~5 secs or more), proceed with caution. You may run out of memory. As of Go 1.4, spawning a goroutine allocates a 2048 byte stack. For 2 million lines, you could allocate over 2GB of RAM for the goroutine stacks alone. Consider whether it's worth allocating this memory.
In short, you will probably get the best results with the following setup:
for file in folder:
go process_file(file)
If the number of files exceeds the number of CPUs, you're likely to have enough concurrency to mask the disk I/O latency involved in reading the files from disk.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With