I have a service that transfers messages at a quite high rate.
Currently it is served by akka-tcp and it makes 3.5M messages per minute. I decided to give grpc a try. Unfortunately it resulted in much smaller throughput: ~500k messages per minute an even less.
Could you please recommend how to optimize it?
My setup
Hardware: 32 cores, 24Gb heap.
grpc version: 1.25.0
Message format and endpoint
Message is basically a binary blob. Client streams 100K - 1M and more messages into the same request (asynchronously), server doesn't respond with anything, client uses a no-op observer
service MyService {
rpc send (stream MyMessage) returns (stream DummyResponse);
}
message MyMessage {
int64 someField = 1;
bytes payload = 2; //not huge
}
message DummyResponse {
}
Problems:
Message rate is low compared to akka implementation.
I observe low CPU usage so I suspect that grpc call is actually blocking internally despite it says otherwise. Calling onNext()
indeed doesn't return immediately but there is also GC on the table.
I tried to spawn more senders to mitigate this issue but didn't get much of improvement.
My findings Grpc actually allocates a 8KB byte buffer on each message when serializes it. See the stacktrace:
java.lang.Thread.State: BLOCKED (on object monitor) at com.google.common.io.ByteStreams.createBuffer(ByteStreams.java:58) at com.google.common.io.ByteStreams.copy(ByteStreams.java:105) at io.grpc.internal.MessageFramer.writeToOutputStream(MessageFramer.java:274) at io.grpc.internal.MessageFramer.writeKnownLengthUncompressed(MessageFramer.java:230) at io.grpc.internal.MessageFramer.writeUncompressed(MessageFramer.java:168) at io.grpc.internal.MessageFramer.writePayload(MessageFramer.java:141) at io.grpc.internal.AbstractStream.writeMessage(AbstractStream.java:53) at io.grpc.internal.ForwardingClientStream.writeMessage(ForwardingClientStream.java:37) at io.grpc.internal.DelayedStream.writeMessage(DelayedStream.java:252) at io.grpc.internal.ClientCallImpl.sendMessageInternal(ClientCallImpl.java:473) at io.grpc.internal.ClientCallImpl.sendMessage(ClientCallImpl.java:457) at io.grpc.ForwardingClientCall.sendMessage(ForwardingClientCall.java:37) at io.grpc.ForwardingClientCall.sendMessage(ForwardingClientCall.java:37) at io.grpc.stub.ClientCalls$CallToStreamObserverAdapter.onNext(ClientCalls.java:346)
Any help with best practices on building high-throughput grpc clients appreciated.
One way to speed up programs is to make sure the CPU is not idling. To do this, we issue work concurrently. In gRPC Java, there are three types of stubs: blocking, non-blocking, and listenable future. We have already seen the blocking stub in the client, and the non-blocking stub in the server.
Again protoc can be used to generate the code in the preferred programming language. For C#, it's the best way to use Grpc.
Multiple gRPC clients can be created from a channel, including different types of clients. A channel and clients created from the channel can safely be used by multiple threads. Clients created from the channel can make multiple simultaneous calls.
grpc; The first line tells the compiler what syntax is used in this file. By default, the compiler generates all the Java code in a single Java file. The second line overrides this setting, and everything will be generated in individual files.
I solved the issue by creating several ManagedChannel
instances per destination. Despite articles say that a ManagedChannel
can spawn enough connections itself so one instance is enough it's wasn't true in my case.
Performance is in parity with akka-tcp implementation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With