Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use cases for ithreads (interpreter threads) in Perl and rationale for using or not using them?

If you want to learn how to use Perl interpreter threads, there's good documentation in perlthrtut (threads tutorial) and the threads pragma manpage. It's definitely good enough to write some simple scripts.

However, I have found little guidance on the web on why and what to sensibly use Perl's interpreter threads for. In fact, there's not much talk about them, and if people talk about them it's quite often to discourage people from using them.

These threads, available when perl -V:useithreads is useithreads='define'; and unleashed by use threads, are also called ithreads, and maybe more appropriately so as they are very different from threads as offered by the Linux or Windows operating systems or the Java VM in that nothing is shared by default and instead a lot of data is copied, not just the thread stack, thus significantly increasing the process size. (To see the effect, load some modules in a test script, then create threads in a loop pausing for key presses each time around, and watch memory rise in task manager or top.)

[...] every time you start a thread all data structures are copied to the new thread. And when I say all, I mean all. This e.g. includes package stashes, global variables, lexicals in scope. Everything!

-- Things you need to know before programming Perl ithreads (Perlmonks 2003)

When researching the subject of Perl ithreads, you'll see people discouraging you from using them ("extremely bad idea", "fundamentally flawed", or "never use ithreads for anything").

The Perl thread tutorial highlights that "Perl Threads Are Different", but it doesn't much bother to explain how they are different and what that means for the user.

A useful but very brief explanation of what ithreads really are is from the Coro manpage under the heading WINDOWS PROCESS EMULATION. The author of that module (Coro - the only real threads in perl) also discourages using Perl interpreter threads.

Somewhere I read that compiling perl with threads enabled will result in a significantly slower interpreter.

There's a Perlmonks page from 2003 (Things you need to know before programming Perl ithreads), in which the author asks: "Now you may wonder why Perl ithreads didn't use fork()? Wouldn't that have made a lot more sense?" This seems to have been written by the author of the forks pragma. Not sure the info given on that page still holds true in 2012 for newer Perls.

Here are some guidelines for usage of threads in Perl I have distilled from my readings (maybe erroneously so):

  • Consider using non-blocking IO instead of threads, like with HTTP::Async, or AnyEvent::Socket, or Coro::Socket.
  • Consider using Perl interpreter threads on Windows only, not on UNIX because on UNIX, forks are more efficient both for memory and execution speed.
  • Create threads at beginning of program, not when memory concumption already considerable - see "ideal way to reduce these costs" in perlthrtut.
  • Minimize communication between threads because it's slow (all answers on that page).

So far my research. Now, thanks for any more light you can shed on this issue of threads in Perl. What are some sensible use cases for ithreads in Perl? What is the rationale for using or not using them?

like image 345
Lumi Avatar asked Apr 02 '12 09:04

Lumi


People also ask

How do I use multithreading in Perl?

Creating Threads Like any other module, you need to tell Perl that you want to use it; use threads; imports all the pieces you need to create basic threads. The create() method takes a reference to a subroutine and creates a new thread that starts executing in the referenced subroutine.

Does Perl support multithreading?

Perl can do asynchronous programming with modules like IO::Async or Coro, but it's single threaded. You can compile Perl with threads, which provide multi-threaded computing.

Is Perl a lightweight?

#8 Perl Is Used For Serving Web Pages Want an extremely lightweight, easily customizable Web server? You can implement one in Perl in under 200 lines of code (compare this to Apache HTTP, which weighs in at around 200,000 lines of code).


2 Answers

The short answer is that they're quite heavy (you can't launch 100+ of them cheaply), and they exhibit unexpected behaviours (somewhat mitigated by recent CPAN modules).

You can safely use Perl ithreads by treating them as independent Actors.

  1. Create a Thread::Queue::Any for "work".
  2. Launch multiple ithreads and "result" Queues passing them the ("work" + own "result") Queues by closure.
  3. Load (require) all the remaining code your application requires (not before threads!)
  4. Add work for the threads into the Queue as required.

In "worker" ithreads:

  1. Bring in any common code (for any kind of job)
  2. Blocking-dequeue a piece of work from the Queue
  3. Demand-load any other dependencies required for this piece of work.
  4. Do the work.
  5. Pass the result back to the main thread via the "result" queue.
  6. Back to 2.

If some "worker" threads start to get a little beefy, and you need to limit "worker" threads to some number then launch new ones in their place, then create a "launcher" thread first, whose job it is to launch "worker" threads and hook them up to the main thread.

What are the main problems with Perl ithreads?

They're a little inconvenient with for "shared" data as you need to explicity do the sharing (not a big issue).

You need to look out for the behaviour of objects with DESTROY methods as they go out of scope in some thread (if they're still required in another!)

The big one: Data/variables that aren't explicitly shared are CLONED into new threads. This is a performance hit and probably not at all what you intended. The work around is to launch ithreads from a pretty much "pristine" condition (not many modules loaded).

IIRC, there are modules in the Threads:: namespace that help with making dependencies explicit and/or cleaning up cloned data for new threads.

Also, IIRC, there's a slightly different model using ithreads called "Apartment" threads, implemented by Thread::Appartment which has a different usage pattern and another set of trade-offs.

The upshot:

Don't use them unless you know what you're doing :-)

Fork may be more efficient on Unix, but the IPC story is much simpler for ithreads. (This may have been mitigated by CPAN modules since I last looked :-)

They're still better than Python's threads.

There might, one day, be something much better in Perl 6.

like image 55
David-SkyMesh Avatar answered Sep 28 '22 06:09

David-SkyMesh


I have used perl's "threads" on several occasions. They're most useful for launching some process and continuing on with something else. I don't have a lot of experience in the theory of how they work under the hood, but I do have a lot of practical coding experience with them.

For example, I have a server thread that listens for incoming network connections and spits out a status response when someone asks for it. I create that thread, then move on and create another thread that monitors the system, checking five items, sleeping a few seconds, and looping again. It might take 3-4 seconds to collect the monitor data, then it gets shoved into a shared variable, and the server thread can read that when needed and immediately return the last known result to whomever asks. The monitor thread, when it finds that an item is in a bad state, kicks off a separate thread to repair that item. Then it moves on, checking the other items while the bad one is repaired, and kicking off other threads for other bad items or joining finished repair threads. The main program all the while is looping every few seconds, making sure that the monitor and server threads aren't joinable/still running. All of this could be written as a bunch of separate programs utilizing some other form of IPC, but perl's threads make it simple.

Another place where I've used them is in a fractal generator. I would split up portions of the image using some algorithm and then launch as many threads as I have CPUs to do the work. They'd each stuff their results into a single GD object, which didn't cause problems because they were each working on different portions of the array, and then when done I'd write out the GD image. It was my introduction to using perl threads, and was a good introduction, but then I rewrote it in C and it was two orders of magnitude faster :-). Then I rewrote my perl threaded version to use Inline::C, and it was only 20% slower than the pure C version. Still, in most cases where you'd want to use threads due to being CPU intensive, you'd probably want to just choose another language.

As mentioned by others, fork and threads really overlap for a lot of purposes. Coro, however, doesn't really allow for multi-cpu use or parallel processing like fork and thread do, you'll only ever see your process using 100%. I'm over-simplifying this, but I think the easiest way to describe Coro is that it's a scheduler for your subroutines. If you have a subroutine that blocks you can hop to another and do something else while you wait, for example of you have an app that calculates results and writes them to a file. One block might calculate results and push them into a channel. When it runs out of work, another block starts writing them to disk. While that block is waiting on disk, the other block can start calculating results again if it gets more work. Admittedly I haven't done much with Coro; it sounds like a good way to speed some things up, but I'm a bit put off by not being able to do two things at once.

My own personal preference if I want to do multiprocessing is to use fork if I'm doing lots of small or short things, threads for a handful of large or long-lived things.

like image 43
Marcus Avatar answered Sep 28 '22 07:09

Marcus