I have access to a computer with multiple CPU cores (i.e., 56) and when training models using Tensorflow I would like to make the maximum usage of the aforementioned cores, by making each one of the cores an independent trainer of the model.
In Tensorflow's documentation, I found these two parameters (Inter and Intra Op parallelism) that control the parallelism while training models. However, these two parameters do not allow to perform what I intend.
How can I make each core an independent worker? (i.e., batches of samples are sharded by each one of the workers, and then each worker computes gradients based on the samples that were assigned. Finally, each worker updates the variables (which are shared by all the workers) according to the gradients it has calculated.
To parallelize among all 56 CPU cores effectively you will have to use Distributed TensorFlow. It is also possible to parallelize using threading but it will not scale well for so many cores.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With