Right design in akka. - Message delivery

Tags:

I have gone through some posts on how and why akka does not guarantee message delivery. The documentation, this discussion and the other discussions on group do explain it well.

I am pretty new to akka and wish to know the appropriate design for a case. For example say I have 3 different actors all on different machines. One is responsible for cookbooks, the other for history and the last for technology books.

I have a main actor on another machine. Suppose there is a query to the main-actor to search if we have some book available. The main actor sends requests to the 3 remote actors, and expects the result. So I do this:

  val scatter = system.actorOf(         Props[SearchActor].withRouter(ScatterGatherFirstCompletedRouter(               routees=someRoutees, within = 10 seconds)), "router")   implicit val timeout = Timeout(10 seconds)   val futureResult = scatter ?  Text("Concurrency in Practice")   //      What should I do here?.   //val result = Await.result(futureResult, timeout.duration) line(a)

In short, I have sent requests to all 3 remote actors and expect the result in 10 seconds.

What should be the action?

Say I do not get the result in 10 seconds, should I send a new request to all of them again?
What if within time above is premature. But I do not know pre-hand on how much time it might take.
What if within time was sufficient but the message got dropped.

If i dont get response in within time and resend the request again. Something like this, it remain asynchronous:

futureResult onComplete{   case Success(i) => println("Result "+i)   case Failure(e) => //send again }

But under too many queries, wont it be too many threads on the call and bulky? If I uncomment line(a), it becomes synchronous and under load might perform badly.

Say I dont get response in 10 seconds. If within time was premature, then its a heavy useless computation happening again. If messsage got dropped, then 10 seconds of valuable time wasted. In case, say I knew that the message got delivered, I would probably wait for longer duration without being skeptical.

How do people solve such issues? ACK? But then I have to store the state in actor of all queries. It must be a common thing and I am looking for right design.

834

asked May 29 '13 10:05

Jatin

1 Answers

I'm going to try and answer some of these questions for you. I'm not going to have concrete answers for everything, but hopefully I can guide you in the right direction.

For starters, you will need to make a change in how you are communicating the request to the 3 actors that do book searches. Using a ScatterGatherFirstCompletedRouter is probably not the correct approach here. This router will only wait for an answer from one of the routees (the first one to respond), so your set of results will be incomplete as it will not contain results from the other 2 routees. There is also a BroadcastRouter, but that will not fit your needs either as it only handles tell (!) and not ask (?). To do what you want to do, one option is to send the request to each receipient, getting Futures for the responses and then combine them into an aggregate Future using Future.sequence. A simplified example could look like this:

case class SearchBooks(title:String) case class Book(id:Long, title:String)  class BookSearcher extends Actor{    def receive = {     case req:SearchBooks =>       val routees:List[ActorRef] = ...//Lookup routees here       implicit val timeout = Timeout(10 seconds)       implicit val ec = context.system.dispatcher        val futures = routees.map(routee => (routee ? req).mapTo[List[Book]])       val fut = Future.sequence(futures)        val caller = sender //Important to not close over sender       fut onComplete{         case Success(books) => caller ! books.flatten          case Failure(ex) => caller ! Status.Failure(ex)       }   } }

Now that's not going to be our final code, but it's an approximation of what your sample was attempting to do. In this example, if any one of the downstream routees fails/times out, we will hit our Failure block, and the caller will also get a failure. If they all succeed, the caller will get the aggregate List of Book objects instead.

Now onto your questions. First, you ask if you should send a request to all of the actors again if you do not get an answer from one of the routees within the timeout. The answer to this question really up to you. Would you allow your user on the other end to see a partial result (i.e. the results from 2 of the 3 actors), or does it always have to be the full set of results every time? If the answer is yes, you could tweak the code that is sending to the routees to look like this:

val futures = routees.map(routee => (routee ? req).mapTo[List[Book]].recover{   case ex =>     //probably log something here     List() })

With this code, if any of the routees timesout or fails for any reason, an empty list of 'Book` will be substituted in for the response instead of the failure. Now, if you can't live with partial results, then you could resend the entire request again, but you have to remember that there is probably someone on the other end waiting for their book results and they don't want to wait forever.

For your second question, you ask if what if your timeout is premature? The timeout value you select is going to be completely up to you, but it most likely should be based on two factors. The first factor will come from testing the call times of the searches. Find out on average how long it takes and select a value based on that with a little cushion just to be safe. The second factor is how long someone on the other end is willing to wait for their results. You could just be very conservative in your timeout, making it like 60 seconds just to be safe, but if there is indeed someone on the other end waiting for results, how long are they willing to wait? I'd rather get a failure response indicating that I should try again instead of waiting forever. So taking those two factors into account, you should select a value that will allow you to get responses a very high percentage of the time while still not making the caller on the other end wait too long.

For question 3, you ask what happens if the message gets dropped. In this case I'm guessing that the future for whoever was to receive that message will just timeout because it will not get a response because the recipient actor will never receive a message to respond to. Akka is not JMS; it doesn't have acknowledgement modes where a message can be resent a number of times if the recipient does not receive and ack the message.

Also, as you can see from my example, I agree with not blocking on the aggregate Future by using Await. I prefer using the non-blocking callbacks. Blocking in a receive function is not ideal as that Actor instance will stop processing its mailbox until that blocking operation completes. By using a non-blocking callback, you free that instance up to go back to processing its mailbox and allow the handling of the result to be just another job that is executed in the ExecutionContext, decoupled from the actor processing its mailbox.

Now if you really want to not waste communications when the network is not reliable, you could look into the Reliable Proxy available in Akka 2.2. If you don't want to go this route, you could roll it yourself by sending ping type messages to the routees periodically. If one does not respond in time, you mark it as down and do not send messages to it until you can get a reliable (in a very short amount of time) ping from it, sort of like a FSM per routee. Either of these can work if you absolutely need this behavior, but you need to remember that these solutions add complexity and should only be employed if you absolutely need this behavior. If you're developing bank software and you absolutely need guaranteed delivery semantics as bad financial implications will result otherwise, by all means go with this kind of approach. Just be judicious in deciding if you need something like this because I bet 90% of the time you don't. In your model, the only person probably affected by waiting on something that you might have already known won't be successful is the caller on the other end. By using non-blocking callbacks in the actor, it's not being halted by the fact that something might take a long time; it's already moved in to its next message. You also do need to be careful if you decide to resubmit on failure. You don't want to flood the receiving actors mailboxes. If you decide to resend, cap it at a fixed number of times.

One other possible approach if you need these guaranteed kind of semantics might be to look into Akka's Clustering Model. If you clustered the downstream routees, and one of the servers was failing, then all traffic would be routed to the node that was still up until that other node recovered.

125

answered Jan 10 '23 20:01

cmbaxter

Related questions
                            
                                Mark CSS rule as less important?
                            
                                C++: char** to const char** conversion [duplicate]
                            
                                How to persist enum value in slick
                            
                                Is it possible to plot the smooth components of a gam fit with ggplot2?
                            
                                Symfony2 post-update-cmd gives "An error occurred when generating the bootstrap file"
                            
                                Draft commits on mercurial
                            
                                When does command substitution spawn more subshells than the same commands in isolation?
                            
                                JavaScript multiplying by 100 giving weird result [duplicate]
                            
                                Are all functions "noexcept" if exceptions are disabled?
                            
                                When do you use std::unordered_map::emplace_hint?
                            
                                Using Selenium WebDriver with Tor
                            
                                What is the purpose of a bare asterisk in function arguments?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With