Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Service Fabric Actor performance unreliable?

I'm working with a Service Fabric application which I'm not quite able to get to perform as well as hoped.

The main issue is related to one actor calling another. I'm logging how long a given call takes as seen from the calling actor, and I log the time spent on the receiving actor.

What I see, is that the receiving actor logs that the workload takes a few milliseconds (20 at the most). However, the calling actor logs anything from 50 ms up to well over 2 seconds. The delay I cannot account for, comes before the actual logic runs. Once the method returns, the calling actor gets the response quickly.

Is this as can be expected? It is definitely worst when creating a brand new actor instance - but I see this sort of thing even when I'm calling an actor I did a different call to moments earlier.

The parameters passed is fairly basic - I don't suspect deserialization to be the issue.

I realize that actors will get distributed inside the cluster, but overhead on this scale seems out of proportion.

So, my question is: Is this "as expected" or does it indicate we're doing something wrong?

I will add that this is in a quiet test environment, so actors being locked up by other requests is not the issue.

I can provide more info upon request, but I'm not quite sure what might be most relevant.

like image 815
Christian Rygg Avatar asked Jan 25 '18 01:01

Christian Rygg


1 Answers

There are many variables to consider in your scenario, and the bottleneck might be everywhere. As you might be aware to call an actor and get a response you require many steps. I will provide a few common and you investigate further.

  • The first step to know is where your actor is located, so the caller must call the Proxy that will find the actor address in the Naming Service. The first call will take a while to discovery their addresses. The following calls to the same Actor will be cached.
  • The connection between the caller and the actor need to be established, if they are in different nodes you add an extra latency to your call.
  • The serialization of your message and response will also take a few milliseconds, and depending on the size of your message, this can take a considerable amount of time.
  • The actor activation process might have to do some work before handling the request, like loading\saving\sync the actor state.
  • Actor Thread Synchronization: if you hit the same actor concurrently, the calls will be enqueued and processed in order, so if you make 5 calls at the same time to the same actor, and it takes about 1sec to process each call, one of your calls will take around 5 seconds to complete in a waiting state.

So, if you consider these basic points, your service might be hitting the network & discovery latency, serialization and concurrency scheduling, actor creation & data synchronization.

Based on your scenario, I would assume the issue is concurrency more than anything else. Probably you have something locking the actor after\before the following requests

like image 149
Diego Mendes Avatar answered Nov 06 '22 05:11

Diego Mendes