Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best IPC mechanism for medium-sized data in Perl? [closed]

Tags:

perl

ipc

I'm working on designing a multi-tiered app in Perl and I'm wondering about the pros and cons of the various IPC mechnisms available to me. I'm looking at handling moderately-sized data, typically a few dozen kilobytes but up to a couple of megabytes, and the load is pretty light, at most a couple of hundred requests per minute.

My primary concerns are maintainability and performance (in that order). I don't think I'll need to scale up to more than one server, or port off of our main platform (RHEL), but I suppose it's something to consider.

I can think of the following options:

  • Temporary files - Simplistic, probably the worst option in terms of speed and storage requirements
  • UNIX domain sockets - Not portable, not scalable
  • Internet Sockets - Portable, scalable
  • Pipes - Portable, not scalable (?)

Considering that scalability and portability are not my primary concerns, I need to learn more. What's the best choice, and why? Please comment if you need additional information.


EDIT: I'll try to give more detail in response to ysth's questions (warning, wall of text follows):

  • Are readers/writers in a one-to-one relationship, or something more more complicated?
  • What do you want to happen to the writer if the reader is no longer there or busy?
  • And vice versa?
  • What other information do you have about your desired usage?

At this point, I'm contemplating a three-tiered approach, but I'm not sure how many processes I'll have in each tier. I think I need to have more processes towards the left side and fewer toward the right, but maybe I should have the same number across the board:

 .---------.          .----------.        .-------.
 | Request |  ----->  | Business | -----> | Data  |
 | Manager |  <-----  |  Logic   | <----- | Layer |
 `---------'          `----------'        `-------'

These names are still generic and probably won't make it into the implementation in these forms.

The request manager is responsible for listening for requests from different interfaces, for example web requests and CLI (where response time is important) and e-mail (where response time is less important). It performs logging and manages the responses to the requests (which are rendered in a format appropriate to the type of request).

It sends data about the request to the business logic which performs logging, authorization depending on business rules, etc.

The business logic (if it needs to) then requests data from the data layer, which can either talk to (most often) the internal MySQL database or some other data source outside our team's control (e.g., our organization's primary LDAP servers, or our DB2 employee information database, etc.). This is mostly simply a wrapper which formats the data in a uniform way so that it can be handled more easily in the business logic.

The information then flows back to to the request manager for presentation.

If, when data is flowing to the right, the reader is busy, for the interactive requests I'd like to simply wait a suitable period of time, and return a timeout error if I don't get access in that amount of time (e.g. "Try again later"). For the non-interactive requests (e.g. e-mail), the polling system can simply exit and try again on the next invocation (which will probably be once per 1-3 minutes).

When data is flowing in the other direction, there shouldn't be any waiting situations. If one of the processes has died when trying to travel back to the left, all I can really do is log and exit.

Anyway, that was pretty verbose, and since I'm still in early design I probably still have some confused ideas in there. Some of what I've mentioned is probably tangential to the issue of which IPC system to use. I'm open to other suggestions on the design, but I was trying to keep the question limited in scope (For example, maybe I should consider collapsing down to two tiers, which is a much simpler for IPC). What are your thoughts?

like image 524
Adam Bellaire Avatar asked Jan 10 '09 19:01

Adam Bellaire


4 Answers

If you're unsure about your exact requirements at the moment, try to think of a simple interface that you can code to, that any IPC implementation (be it temporary files, TCP/IP or whatever) needs to support. You can then choose a particular IPC flavour (I would start with whatever's easiest and/or easiest to debug -- probably temporary files) and implement the interface using that. If that turns out to be too slow, implement the interface using e.g. TCP/IP. Actually implementing the interface does not involve much work as you will essentially just be forwarding calls to some existing library.

The point is that you have a high-level task to perform ("transmit data from program A to program B") which is more or less independent of the details of how it is performed. By establishing an interface and coding to it, you isolate the main program from changes in the event that you need to change the implementation.

Note that you don't need to use any heavyweight Perl language mechanisms to capitalise on the idea of having an interface. You could simply have e.g. 3 different packages (for temp files, TCP/IP, Unix domain sockets), each of which exports the same set of methods. Choosing which implementation you want to use in your main program amounts to choosing which module to use.

like image 171
j_random_hacker Avatar answered Oct 23 '22 06:10

j_random_hacker


Temporary files (and related things, like a shared memory region), are probably a bad bet. If you ever want to run your server on one machine and your clients on another, you will need to rewrite your application. If you pick any of the other options, at least the semantics are the essentially the same, if you need to switch between them at a later date.

My only real advice, though, is to not write this yourself. On the server side, you should use POE (or Coro, etc.), rather than doing select on the socket yourself. Also, if your interface is going to be RPC-ish, use something like JSON-RPC-Common/ from the CPAN.

Finally, there is IPC::PubSub, which might work for you.

like image 27
jrockway Avatar answered Oct 23 '22 08:10

jrockway


Temporary files have other problems besides that. I think Internet socks are really the best choice. They are well documented, and as you say, scalable and portable. Even if that is not a core requirement, you get it nearly for free. Sockets are pretty easy to deal with, again there is copious amounts of documentation. You can build out your data sharing mechanism and protocol out in a library and never have to look at it again!

like image 4
BobbyShaftoe Avatar answered Oct 23 '22 07:10

BobbyShaftoe


UNIX domain sockets are portable across unices. It's no less portable than pipes. It's also more efficient than IP sockets.

Anyway, you missed a few options, shared memory for example. Some would add databases to that list but I'd say that's a rather heavyweight solution.

Message queues would also be a possibility, though you'd have to change a kernel option for it to handle such large messages. Otherwise, they have an ideal interface for a lot of things, and IMHO they are greatly underused.

I generally agree though that using an existing solution is better than building somethings of your own. I don't know the specifics of your problem, but I'd suggest you'd check out the IPC section of CPAN

like image 3
Leon Timmermans Avatar answered Oct 23 '22 08:10

Leon Timmermans