Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it bad practice to just kick off new threads for blocking operations (Perl)

If doing CPU intensive tasks I believe it is optimal to have one thread per core. If you have a 4 core CPU you can run 4 instances of a CPU intensive subroutine without any penalty. For example I once experimentally ran four instances of a CPU intensive algorithm on a four core CPU. Up to four times the time per process did not decrease. At the fifth instances all instances took longer.

What is the case for blocking operations? Let's say I have a list of 1,000 URLs. I have been doing the following:

(Please don't mind any syntax errors, I just mocked this up)

my @threads;
foreach my $url (@urlList) {    
     push @threads, async {
         my $response = $ua->get($url);
         return $response->content;   
     }
}

foreach my $thread (@threads) {
    my $response = $thread->join;
    do_stuff($response); 
}

I am essentially kicking off as many threads as there are URLs in the URL list. If there are a million URLs then a million threads will be kicked off. Is this optimal, if not what is an optimal number of threads? Is using threads a good practice for ANY blocking I/O operation that can wait (reading a file, database queries, etc)?

Related Bonus Question

Out of curiosity does Perl threads work the same as Python and it's GIL? With python to get the benefit of multithreading and utilize all cores for CPU intensive tasks you have to use multiprocessing.

like image 226
john doe Avatar asked Jun 24 '13 13:06

john doe


2 Answers

Out of curiosity does Perl threads work the same as Python and it's GIL? With python to get the benefit of multithreading and utilize all cores for CPU intensive tasks you have to use multiprocessing.

No, but the conclusion is the same. Perl doesn't have a big lock protecting the interpreter across threads; instead it has a duplicate interpreter for each different thread. Since a variable belongs to an interpreter (and only one interpreter), no data is shared by default between threads. When variables are explicitly shared they're placed in a shared interpreter which serializes all accesses to shared variables on behalf of the other threads. In addition to the memory issues mentioned by others here, there are also some serious performance issues with threads in Perl, as well as limitations on the kind of data that can be shared and what you can do with it (see perlthrtut for more info).

The upshot is, if you need to parallelize a lot of IO and you can make it non-blocking, you'll get a lot more performance out of an event loop model than threads. If you need to parallelize stuff that can't be made non-blocking, you'll probably have a lot more luck with multi-process than with perl threads (and once you're familiar with that kind of code, it's also easier to debug).

It's also possible to combine the two models (for example, a mostly-single-process evented app that passes off certain expensive work to child processes using POE::Wheel::Run or AnyEvent::Run, or a multi-process app that has an evented parent managing non-evented children, or a Node Cluster type setup where you have a number of preforked evented webservers, with a parent that just accepts and passes FDs to its children).

There's no silver bullets, though, at least not yet.

like image 69
hobbs Avatar answered Oct 22 '22 19:10

hobbs


From here: http://perldoc.perl.org/threads.html

Memory consumption

On most systems, frequent and continual creation and destruction of threads can lead to ever-increasing growth in the memory footprint of the Perl interpreter. While it is simple to just launch threads and then ->join() or ->detach() them, for long-lived applications, it is better to maintain a pool of threads, and to reuse them for the work needed, using queues to notify threads of pending work. The CPAN distribution of this module contains a simple example (examples/pool_reuse.pl) illustrating the creation, use and monitoring of a pool of reusable threads.

like image 4
Jean Avatar answered Oct 22 '22 20:10

Jean