Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do you use `stream` for GRPC/protobuf file transfers?

I've seen a couple examples like this:

service Service{
  rpc updload(stream Data) returns (google.protobuf.Empty) {};
  rpc download(google.protobuf.Empty) returns (stream Data) {};
}

message Data { bytes bytes = 1; }

What is the purpose of using stream, does it make the transfer more efficient?

In theory - yes - I obviously wan't to stream my file transfers but that's what happens over a connection... So, what is the actual benefit to this keyword, does it enforce some form of special buffering to reduce some overhead? Either way, the data is being transmitted, in full!

like image 958
Tobi Akinyemi Avatar asked Nov 15 '25 22:11

Tobi Akinyemi


1 Answers

The answer is very similar to the gRPC + Image Upload question, although from a different perspective.

Doing a large download (10+ MB) as a single response message puts strict limits on the size of that download, as the entire response message is sent and processed at once. For most use cases, it is much better to chunk a 100 MB file into 1-10 MB chunks than require all 100 MB to be in memory at once. That also allows the downloader to begin processing the file before the entire file is acquired which reduces processing latency.

Without streaming, chunking would require multiple RPCs, which are annoying to coordinate and have performance complications. Because there is latency to complete RPCs, for reasonable performance you either have to do many RPCs in parallel (but how many?) or have a large batch size (but how big?). Multiple RPCs can also hit colder application caches, as each RPC goes to a different backend.

Using streaming provides the same throughput as the non-chunking approach without as many headaches of normal chunking approaches. Since streaming is pipelined (server can start sending next chunk as soon as previous chunk is sent) there's no added per-chunk latency between the client and server. This makes it much easier to choose a chunk size, as there is a wide range of "reasonable" sizes that will behave similarly and the system will naturally react as network performance varies.

While sending a message on an existing stream has less overhead than creating a new RPC, for many users the difference is negligible and it is generally better to structure your RPCs in a way that is architecturally beneficial to your application and not just to eek out small optimizations in gRPC. The reason to use the stream in this case is to make your application perform better at lower complexity.

like image 64
Eric Anderson Avatar answered Nov 18 '25 21:11

Eric Anderson



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!