Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running 100,000 processes concurrently

I am simulating a banking system in which I have 100,000 transactions to run. Each type of transaction implements runnable, and I have various types of transactions which can occur.

transactions is an array of Runnables.

Ideally, the following code would solve my issue:

for (Transaction transaction : transactions) {
    new Thread(transaction).start();
}

However, obviously a java.lang.OutOfMemoryError: unable to create new native thread is bound to occur when attempting to start 100,000 threads.

So next I tried implementing an ExecutorService to create a thread pool to manage my 100,000 runnables.

ExecutorService service;
int cpus = Runtime.getRuntime().availableProcessors();
// cpus == 8 in my case
service = Executors.newFixedThreadPool(cpus);

for (Transaction transaction : transactions) {
    service.execute(transaction);
}

When trying this approach, long processes "hog" the JVM. For example, one type of transaction takes 30 - 60 seconds to execute. When profiling the application, no other threads are being allowed to run while the long transaction takes place.

In this case, thread 6 did not allow any other threads to run until its single transaction was complete

In this case, thread 6 did not allow any other threads to run until its processing was complete.

So my question is: How can I run 100,000 transactions as fast as possible without running into memory problems? If ExecutorService is the answer, then how can I stop very long transactions from hogging the JVM and allow other transactions to run concurrently?

EDIT:

I am forcing certain types of transactions to occur for 30 - 60 seconds on purpose to ensure that my threaded program is working correctly. Each transaction locks on a single account, and there are 10 accounts. Here is my method which hogs the JVM: ( called by run() )

public void makeTransaction() {
    synchronized(account) {
        long timeStarted = System.nanoTime();
        long timeToEnd = timeStarted + nanos;

        this.view = new BatchView(transactionNumber, account.getId());

        this.displayView();

        while(true) {
            if(System.nanoTime() % 1000000000 == 0) {
                System.out.println("batch | " + account.getId());
            }

            if(System.nanoTime() >= timeToEnd) {
                break;
            }
        }
    }
}

Each time this transaction gets run, only the one account gets locked, leaving 9 others that should be available for processing. How come the JVM does not process any more threads, and instead hangs until this long transaction finishes?

Here is a link to a minified version of the project to demonstrate the problem: project

like image 723
Feek Avatar asked Mar 20 '15 06:03

Feek


2 Answers

When profiling the application, no other threads are being allowed to run while the long transaction takes place.

Most likely, this task is using a resource which is single threaded. i.e the way ti is written prevents concurrent usage.

How can I run 100,000 transactions as fast as possible without running into memory problems?

If the transactions are CPU bound, you should have a pool about the same size as the number of CPUs you have.

If the transactions depend on a database, you should look at batching them to utilise the database more efficiently.

If ExecutorService is the answer, then how can I stop very long transactions from hogging the JVM and allow other transactions to run concurrently?

Make the transactions much shorter. If you have a task which runs for more than a few milli-seconds you should work out why is it taking so long. I would start by looking at how must network/IO is it using and profiling the task. Most transactions (if you have a large number) should be around 0.01 seconds or far less ideally.

You should take great care to consider how shared resources are used. If your tasks use the same resources too much, you may find that multi-threading is no faster, or is even slower.

like image 94
Peter Lawrey Avatar answered Sep 27 '22 19:09

Peter Lawrey


The problem with your application is that very soon all threads will have assigned a transaction for the same account, and then all but one thread have to wait. You can see this in the following screenshot, were I paused the application. Thread pool-1-thread-3 is currently handling a transaction for the Account object with id 19 (this id is not your account id, but a unique object id Eclipse assigns), and all other threads are waiting for the lock on the same Account object. The account object is the one where your id is 9.

Screenshot of debugger

Why does this happen? In transaction 853, one thread starts the first long running transaction (for account 9). The other threads continue to work on the other transactions. However, when any of the threads reaches another transaction for account 9, it will have to stop and wait. Transactions 857, 861, and 862 are also for account 9, and each blocks one thread, so at this time all my threads are blocked (on my quad core).

How to solve this? This depends on your use case.

If in your actual program it is guaranteed that there is no incoming transaction for a given account X as long as there is another transaction running for account X, you do not need to change anything.

If your number of accounts is very large compared to the number of threads, the problem becomes more unlikely, so you might decide to live with it.

If your number of accounts is relatively low (let's say maybe less than hundred or so), you should (as Peter said) have a single (endlessly-running) thread per account, each with its own transaction queue. This would probably be more efficient, because the threads do not need to "fight" over the shared queue.

Another solution would be to implement some form of "work stealing". This means that whenever a thread would be blocked, it instead looks for some other work to do. To implement this, you first need to be able to check whether a thread could get the lock for a given account. With synchronized in Java this is not possible, so you need something like ReentrantLock.tryLock(). You also need to be able to directly access the transaction queue from each thread, so I guess you cannot use ExecutorService here but need to implement transaction handling yourself (using a LinkedBlockingQueue).

Now each thread would poll transactions from the queue in a loop. First it tries to acquire the lock for the respective account with tryLock(). If this fails, add the transaction to a (thread-specific) list, fetch the next transaction from the queue, and try for this one, until you find a transaction you can handle. After a transaction was finished, first look in the list for now-possible-to handle transactions before pulling another transaction from the global queue. The code could you roughly like the following:

public BlockingQueue<Transaction> queue = ...; // the global queue for all threads

public void run() {
   LinkedList<Transaction> myTransactions = new LinkedList<>();
   while (true) {
     Transaction t = queue.take();
     while (!t.getLock().tryLock()) {
        myTransactions.add(t);
     }
     try {
       // here we hold the lock for t
       t.makeTransaction();
     } finally {
       t.getLock().unlock();
     }

     Iterator<Transaction> iter = myTransactions.iterator();
     while (iter.hasNext()) {
       t = iter.next();
       if (t.getLock().tryLock()) {
         try {
           t.makeTransaction();
         } finally {
           t.getLock().unlock();
         }
         iter.remove();
       }
     }
   }
 }

Note that this still has at least the following problems you might want to address:

  • While a thread hangs in queue.take(), it does not check whether the transactions in its list have become available. So if there are periods of time where queue is empty (at the end of processing for example), there might be transactions stuck in the lists that are not being handled.
  • If a significant amount of locks is being held by some of the threads, the remaining threads could take a lot of transactions that they cannot handle right now, so they would just fill their local list, draining the global queue. When the locks are released, many transactions might have been removed from the global queue, creating an imbalance between the work the threads can do (some threads might be idling while others are still working on their long backlog of transactions).

A simpler alternative might be to put() transactions into the queue (at the end) if you cannot acquire the lock for them, but this would make them executed in a very arbitrary order (which may happen with the above solution, too, but maybe not so extremely).

Edit: A better solution might be to attach a queue to each account instead of thread-specific lists. Then a thread would add a transaction to the queue of the respective account whenever it finds this account blocked. When a thread finishes a transaction for account X, it should first look in the queue of account X, if any transactions have been added there, before looking at the global list.

like image 30
Philipp Wendler Avatar answered Sep 27 '22 17:09

Philipp Wendler