Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does "num_envs_per_worker" in rllib do?

Tags:

python

ray

rllib

For the life of me I don't get what "num_envs_per_worker" does. If the limiting factor is policy evaluation why would we need to create multiple environments? Wouldn't we need to create multiple policies?

ELI5 please?

The docs say:

Vectorization within a single process: Though many envs can achieve high frame rates per core, their throughput is limited in practice by policy evaluation between steps. For example, even small TensorFlow models incur a couple milliseconds of latency to evaluate. This can be worked around by creating multiple envs per process and batching policy evaluations across these envs. You can configure {"num_envs_per_worker": M} to have RLlib create M concurrent environments per worker. RLlib auto-vectorizes Gym environments via VectorEnv.wrap().

Src: https://ray.readthedocs.io/en/latest/rllib-env.html

like image 231
Andriy Drozdyuk Avatar asked Sep 17 '25 01:09

Andriy Drozdyuk


1 Answers

Probably a bit late on this, but here's my understanding:

  • as the docs you cited mention, there's significant fixed per-call overhead in using TensorFlow (converting data into the appropriate structures, overhead and coordination of passing data to the GPU, etc)
  • however, you can call a TensorFlow model with a batch of data, and the execution time required generally scales nicely. It should scale linearly at the limit, and when going from a single row to a few rows, it might actually scale sub-linearly. E.g. if you are going to pass 1 row of data to a vector processing unit like a GPU (or specialised CPU instructions), you might as well pass as many rows as it can handle it one go, it won't actually take any more time. (those parallel execution units would just have been sitting idle otherwise)
  • therefore, you want to batch up rows of data so that you only pay the fixed per-call cost as infrequently as necessary. One way of doing this is by having several RL environments executing in lockstep. Maybe you have 8 of them, and each of these 8 environments produces its own observation, and then you take these 8 observations and call your TensorFlow model once on this batch of 8 observations, to produce 8 new actions, which you then use to produce 8 new observations, etc. Amortised, this will hopefully be only 1/8th the TensorFlow evaluation cost if each of these environments was making its own TensorFlow calls.
like image 167
Andrew Rosenfeld Avatar answered Sep 19 '25 12:09

Andrew Rosenfeld