Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can an EJB parallelize a long, CPU intensive process?

The application has a CPU intensive long process that currently runs on one server (an EJB method) serially when the client requests it.

It’s theoretically possible (from a conceptual point of view) to split that process in N chunks and execute them in parallel, as long as the output of all parallel jobs can be collected and joined together before sending it back to the client that initiated the process. I’d like to use this parallelization to optimize performance.

How can I implement this parallelization with EJBs? I know that we should not create threads in a EJB method. Instead, we should publish messages (one per job) to be consumed by message driven beans (MDBs). But then it would not be a synchronous call anymore. And being synchronous seems to be a requirement in this case since I need to collect the output of all jobs before sending it back to the client.

Is there a solution for this?

like image 415
b.roth Avatar asked Jan 05 '10 12:01

b.roth


3 Answers

There are all sorts of ways to do this.

One, you can use an EJB Timer to create a run-once process that will start immediately. This is a good technique to spawn processes in the background. A EJB Timer is associated with a specific Session Bean implementation. You can either add an EJB Timer to every Session Bean that you want to be able to do this, or you can have a single Session Bean that can then call your application logic through some dispatch mechanism.

For me, I pass a serializable blob of parameters along with a class name that meets a specific interface to a generic Session Bean that then executes the class. This way I can easily background most anything.

One caveat about the EJB Timer is that EJB Timers are persistent. Once you create an EJB Timer is stays in the container until its job is finished or canceled. The gotcha on this is that if you have a long running process, and the server goes down, when it restarts the process will continue and pick back up. Mind this can be a good thing, but only if your process is prepared to be restarted. But if your have a simple process iterating through "10,000 items", if the server goes down on item 9,999, when it comes back up you can easily see it simply starting over at item 1. It's all workable, just a caveat to be aware of.

Another way to background something is you can use a JMS queue. Put a message on the queue, and the handler runs aysnchronously from the rest of your application.

The clever part here, and something I has also done leveraging the work with the Timer Bean, is you can control how many "jobs" will run based on how many MDB instances you configure the system to have.

So, for the specific task of running a process in multiple, parallel chunks, I take the task, break it up in to "pieces", and then send each piece on the Message Queue, where the MDBs execute them. If I allow 10 instances of the MDB, I can have 10 "parts" of any task running simultaneously.

This actually works surprisingly well. There's a little overhead it splitting the process up and routing it through the JMS queue, but that's all basically "start up time". Once it gets going, you get a real benefit.

Another benefit of using the Message Queue is you can have your actual long running processes executing on a separate machine, or you can readily create a cluster of machines to handle these processes. Yet, the interface is the same, and the code doesn't know the difference.

I've found once you've relegated a long running process to the background, you can pay the price of having less-that-instant access to that process. That is, there's no reason to monitor the executing classes themselves directly, just have them publish interesting information and statistic to the database, or JMX, or whatever rather than having something that can monitor the object directly because it shares the same memory space.

I was easily able to set up a framework that lets task run either on the EJB Timer or on the MDB scatter queue, the tasks are the same, and I could monitor their progress, stop them, etc.

You could combine the scatter technique to create several EJB Timer jobs. One of the free advantages of the MDB is it acts as a thread pool which can throttle your jobs (so you don't suddenly saturate your system with too many background processes). You get this "for free" just by leveraging the EJB management features in the container.

Finally, Java EE 6 has a new "asynchronous" (or something) qualifier for Session Bean methods. I do not know the details on how this works, as I've yet to play with a new Java EE 6 container. But I imagine you're probably not going to want to change containers just for this facility.

like image 65
Will Hartung Avatar answered Oct 18 '22 02:10

Will Hartung


This particular question has come up on multiple occasions and I will summarize that there are several possible solutions, only 1 of which I would recommend.

Use a WorkManager from the commonj API. It allows for managed threads in a Java EE container and is specifically designed to fit your use case. If you are using WebSphere or WebLogic, these API's are already available in your server. For others your will have to put a third party solution in yourself.

WorkManager info

Related questions Why Spawning threads is discouraged

like image 35
Robin Avatar answered Oct 18 '22 01:10

Robin


An EJB is a ultimately a transactional component for a client-server system providing request/reply semantics. If you find yourself in the position that you need to pigeonhole a long-running transaction within the bounds of a request/reply cycle, then somewhere your system architect(ure) has taken the wrong turn.

The situation you describe is cleanly and correctly handled by an event based architecture with a messaging back end. Initial event initiates the process (which can then be trivially parallelized by having the workers subscribe to the event topic) and the aggregating process itself raises an event on its completion. You can still squeeze these sequence within the bounds of a request/reply cycle, but you will by necessity violate the letter and spirit of the Java EE system architecture specs.

like image 4
alphazero Avatar answered Oct 18 '22 00:10

alphazero