GAE: What's the difference between and ?

Question

As far as I can read the docs, both settings do the same thing: start a new instance when a request has spent in pending queue longer than that setting says.

<max-pending-latency> The maximum amount of time that App Engine should allow a request to wait in the pending queue before starting a new instance to handle it. Default: "30ms".

A low maximum means App Engine will start new instances sooner for pending requests, improving performance but raising running costs.

A high maximum means users might wait longer for their requests to be served, if there are pending requests and no idle instances to serve them, but your application will cost less to run.

<min-pending-latency> The minimum amount of time that App Engine should allow a request to wait in the pending queue before starting a new instance to handle it.

A low minimum means requests must spend less time in the pending queue when all existing instances are active. This improves performance but increases the cost of running your application.

A high minimum means requests will remain pending longer if all existing instances are active. This lowers running costs but increases the time users must wait for their requests to be served.

Source: https://cloud.google.com/appengine/docs/java/config/appref

What's the difference between min and max then?

Yannick MG · Accepted Answer

The piece of information you might be missing to understand these settings is that App Engine can choose to create an instance at any time between min-pending-latency and max-pending-latency.

This means an instance will never be created to serve a pending request before min-pending-latency and will always be created once max-pending-latency has been reached.

I believe the best way to understand is to look at the the timeline of events when a request enters the pending queue:

A request reaches the application but no instance are available to serve it so it is placed in the pending requests queue.
Until the min-pending-latency is reached: App Engine tries to find an available instance to serve the request and will not create a new instance. If a request is served below this threshold, it is a signal for App Engine to scale down.
After the min-pending-latency is reached and until max-pending-latency is reached: App Engine tries to find an available instance to serve the request.
After the max-pending-latency is reached: App Engine stops searching for an available instance to serve the request and creates a new instance.

Source: app.yaml automatic_scaling element

GAE: What's the difference between <min-pending-latency> and <max-pending-latency>?

Tags:

google-app-engine

Dzmitry Lazerka

1 Answers

Yannick MG

Recent Activity

Donate For Us