Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Architectural issue with Tomcat cluster environment

I am working on project in which we have an authentication mechanism. We are following the below steps in the authentication mechanism.

  1. The user opens a browser and enter his/her email in a text box and click the login button.
  2. The request goes to a server. We generate a random string (for example, 123456) and send a notification to the user's Android/iPhone and makes the the current thread wait with the help of the wait() method.
  3. The user enters a password on his/her phone and clicks the submit button on his/her phone.
  4. Once the user clicks the submit button, we are making a webservice hit the server and passing the previously generated string (for example, 123456) and password.
  5. If the password is correct against the previously entered email, we call the notify() method to the previously waiting thread and send success as the response and the user gets entered into our system.
  6. If the password is incorrect against the previously entered email, we call the notify() method to the previously waiting thread and send failed as the response and display an invalid credential message to the user.

Everything is working fine, but recently we moved to a clustered environment. We found that some threads are not notified even after replied by the user and for an unlimited waiting time.

For the server, we are using Tomcat 5.5, and we are following The Apache Tomcat 5.5 Servlet/JSP Container for making tomcat cluster environment.

Answer :: Possible problem and solution

The possible problem is the multiple JVMs in a clustered environment. Now we are also sending the clustered Tomcat URL to the user Android application along with generated string.

And when the user clicks on the reply button, we are sending the generated string along with the clustered Tomcat URL so in this case both requests are going to the same JVM, and it works fine.

But I am wondering if there is a single solution for the above issue.

There is a problem in this solution. What happens if the clustered Tomcat crashes? The load balancer will send a request to the second clustered Tomcat and again the same problem will arise.

like image 725
Rais Alam Avatar asked Dec 23 '12 11:12

Rais Alam


People also ask

Does Tomcat support clustering?

With built-in support for both synchronous and asynchronous in-memory and external session replication, cluster segmentation, and compatibility with all common load balancing solutions, your Tomcat servers are ready for the cluster right out of the box.

How Tomcat cluster works?

A cluster, in this context, consists of several Tomcat servers that work together to appear as a single system. This is achieved by using load balancers to tie the servers together using each server's network port and IP address.


2 Answers

The underlying reason for your problems is that Java EE was designed to work in a different way - attempting to block/wait on a service thread is one of the important no-no's. I'll give the reason for this first, and how to solve the issue after that.

Java EE (both the web and EJB tier) is designed to be able to scale to very large size (hundreds of computers in a cluster). However, in order to do that, the designers had to make the following assumptions, which are specific limitations on how to code:

  • Transactions are:

    1. Short lived (eg don't block or wait for periods greater than a second or so)
    2. Independent of each other (eg no communication between threads)
    3. For EJBs, managed by the container
  • All user state is maintained in specific data storage containers, including:

    1. A data store accessed through, eg, JDBC. You can use a traditional SQL database or a NoSQL backend
    2. Stateful session beans, if you use EJBs. Think of these as Java Bean that persists its fields to a database. Stateful session beans are managed by the container
    3. Web session This is a key-value store (kinda like a NoSQL database but without the scale or search capabilities) that persists data for a specific user over their session. It's managed by the Java EE container and has the following properties:

      1. It will automatically relocate if the node crashes in a cluster
      2. Users can have more than one current web session (i.e. on two different browsers)
      3. Web sessions end when the user ends their session by logging out, or when the session is inactive for longer than the configurable timeout.
      4. All values that are stored must be serializable for them to be persisted or transfered between nodes in a cluster.

If we follow those rules, the Java EE container can successfully manage a cluster, including shutting down nodes, starting new ones and migrating user sessions, without any specific developer code. Developers write the graphical interface and the business logic - all the 'plumbing' is managed by configurable container features.

Also, at run time, the Java EE container can be monitored and managed by some pretty sophisticated software that can trace application performance and behavioural issues on a live system.

< snark >Well, that was the theory. Practice suggests there are pretty important limitations that were missed, which lead to AOSP and code injection techniques, but that's another story < /snark >

[There are many discussions around the 'net on this. One which focuses on EJBs is here: Why is spawning threads in Java EE container discouraged? Exactly the same is true for web containers such as Tomcat]

Sorry for the essay - but this is important to your problem. Because of the limitations on threads, you should not block on the web request waiting for another, later request.

Another problem with the current design is what should happen if the user becomes disconnected from the network, runs out of power, or simply decides to give up? Presumably you will time out, but after how long? Just too soon for some customers, perhaps, which will cause satisfaction problems. If the timeout is too long, you could end up blocking all worker threads in Tomcat and the server will freeze. This opens your organisation up for a denial of service attack.

EDIT : Improved suggestions after a more detailed description of the algorithm was published.

Notwithstanding the discussion above on the bad practice of blocking a web worker thread and also the possible denial of service, it's clear that the user is presented with a small time window in which to react to the the notification on the Android phone, and this can be kept reasonably small to enhance security. This time window can also be kept below Tomcat's timeout for responses as well. So the thread blocking approach could be used.

There are two ways this problem can be resolved:

  1. Change the focus of the solution to the client end - polling the server using Javascript on the browser
  2. Communication between nodes in the cluster allowing the node receiving the authorization response from the Android App to unblock the node blocking the servlet's response.

For approach 1, the browser polls the server via Javascript with an AJAX call to a web service on Tomcat; the AJAX call returns True if the Android app authenticated. Advantage: client side, minimal implementation on the server, no thread blocking on the server. Disadvantages: During the waiting period, you have to make frequent calls (maybe one a second - the user will not notice this latency) which amounts to a lot of calls and some additional load on the server.

For approach 2, there is again choice:

  1. Block the thread with an Object.wait() optionally storing the node ID, IP or other identifier in a shared data store: If so, the node receiving the Android app authorization needs to:

    1. Either find the node that is currently blocking or broadcast to all nodes in the cluster
    2. For each node in 1. above, send a message that identifies the user session to unblock. The message could be sent via:

      1. Have an internal-only servlet on each node - this is called by the servlet performing the Android app authorization. The internal servlet will call Object.notify on the correct thread
      2. Use a JMS pub-sub message queue to broadcast to all members of the cluster. Each node is a subscriber that, on receipt of a notification will call Object.notify() on the correct thread.
  2. Poll a data store until the thread is authorized to continue: In this case, all the Android app needs to do is save the state in a SQL DB

like image 147
Andrew Alcock Avatar answered Sep 29 '22 21:09

Andrew Alcock


Using wait/notify can be tricky. Remember that any thread can be suspended at any time. So it's possible for notify to be called before wait, in which case wait will then block for ever.

I wouldn't expect this in your case, as you have user interaction involved. But for the type of synchronisation you are doing, try using a Semaphore. Create a Semaphore with 0 (zero) quantity. The waiting thread calls acquire() and it will block until another thread calls release().

Using Semaphore in this way is much more robust that wait/notify for the task you described.

like image 39
David Roussel Avatar answered Sep 29 '22 19:09

David Roussel