As far as I can read the docs, both settings do the same thing: start a new instance when a request has spent in pending queue longer than that setting says.
<max-pending-latency>
The maximum amount of time that App Engine should allow a request to wait in the pending queue before starting a new instance to handle it. Default: "30ms".
- A low maximum means App Engine will start new instances sooner for pending requests, improving performance but raising running costs.
- A high maximum means users might wait longer for their requests to be served, if there are pending requests and no idle instances to serve them, but your application will cost less to run.
<min-pending-latency>
The minimum amount of time that App Engine should allow a request to wait in the pending queue before starting a new instance to handle it.
- A low minimum means requests must spend less time in the pending queue when all existing instances are active. This improves performance but increases the cost of running your application.
- A high minimum means requests will remain pending longer if all existing instances are active. This lowers running costs but increases the time users must wait for their requests to be served.
Source: https://cloud.google.com/appengine/docs/java/config/appref
What's the difference between min and max then?
The piece of information you might be missing to understand these settings is that App Engine can choose to create an instance at any time between min-pending-latency and max-pending-latency.
This means an instance will never be created to serve a pending request before min-pending-latency and will always be created once max-pending-latency has been reached.
I believe the best way to understand is to look at the the timeline of events when a request enters the pending queue:
Source: app.yaml automatic_scaling element
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With