Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Batch HTTP Request Performance gain

I want to know the performance gain from doing a HTTP batch request. is it only reducing the number of round trips to one instead of n times where n is the number of HTTP requests? if it's like that I guess you can keep http connection opened and send your http messages through and once finish you can close it to get performance gain.

like image 598
M.Abulsoud Avatar asked Jan 05 '23 22:01

M.Abulsoud


2 Answers

The performance gain of doing batch requests depends on what you are doing with them. However just as an agnostic approach here you go:

If you can manage a keep-alive connection, yes this means you don't have to do the initial handshake for the connection. That reduces some overhead and certainly saves time spent handling subsequent packets along this connection. Because of this you can "pipeline" requests and decrease overall load latency (all else not considered). However, requests in HTTP1.1 are still bound to be FIFO so you can have hangups. This is where batching is useful. Since even with a keep-alive connection you can have this hangup (HTTP/2 will allow asynchronous handling) you can still have some significant latency between requests.

This can be mitigated further by batching. If possible you lump all the data needed for subsequent requests into one and this way everything is processed together and sent back as one response. Sure it may take a bit longer to handle a single packet as opposed to the sequential method, but your throughput is increased per time because roundtrip latency for request->response is not multiplied. Thus you get an even better performance gain in terms of requests handling speeds.

Naturally this approach depends on what you're doing with the requests for it to be effective. Sometimes batching can put too much stress on a server if you have a lot of users doing this with a lot of data so to increase overall concurrent throughput across all users you sometimes need to take the technically slower sequential approach to balance things out. However, the best approach will be known by you upon some simple monitoring and analysis.

And as always, don't optimize prematurely :)

like image 180
Corvus Crypto Avatar answered Jan 08 '23 11:01

Corvus Crypto


Consider this typical scenario: the client has the identifier of a resource which resides in a database behind an HTTP server, of which resource they want to get an object representation.

The general flow to execute that goes like this:

  • The client code constructs an HTTP client.
  • The client builds an URI and sets the proper HTTP request fields.
  • Client issues the HTTP request.
  • Client OS initiates a TCP connection, which the server accepts.
  • Client sends the request to the server.
  • Server OS or webserver parses the request.
  • Server middleware parses the request components into a request for the server application.
  • Server application gets initialized, the relevant module is loaded and passed the request components.
  • The module obtains an SQL connection.
  • Module builds an SQL query.
  • The SQL server finds the record and returns that to the module.
  • Module parses the SQL response into an object.
  • Module selects the proper serializer through content negotiation, JSON in this case.
  • The JSON serializer serializes the object into a JSON string.
  • The response containing the JSON string is returned by the module.
  • Middleware returns this response to the HTTP server.
  • Server sends the response to the client.
  • Client fires up their version of the JSON serializer.
  • Client deserializes the JSON into an object.

And there you have it, one object obtained from a webserver.

Now each of those steps along the way is heavily optimized, because a typical server and client execute them so many times. However, even if one of those steps only take a millisecond, when you for example have fifty resources to obtain, those milliseconds add up fast.

So yes, HTTP keep-alive cuts away the time the TCP connection takes to build up and warm up, but each and every other step will still have to be executed fifty times. Yes, there's SQL connection pooling, but every query to the database adds overhead.

So instead of going through this flow fifty separate times, if you have an endpoint that can accept fifty identifiers at once, for example through a comma-separated query string or even a POST with a body, and return their JSON representation at once, that will always be way faster than individual requests.

like image 43
CodeCaster Avatar answered Jan 08 '23 11:01

CodeCaster