I'm using Keras with Tensorflow backend on a cluster (creating neural networks). How can I run it in a multi-threaded way on the cluster (on several cores) or is this done automatically by Keras? For example in Java one can create several threads, each thread running on a core.
If possible, how many cores should be used?
Yes, a single process can run multiple threads on different cores. Caching is specific to the hardware. Many modern Intel processors have three layers of caching, where the last level cache is shared across cores.
Running TensorFlow on multicore CPUs can be an attractive option, e.g., where a workflow is dominated by IO and faster computational hardware has less impact on runtime, or simply where no GPUs are available. This talk will discuss which TensorFlow package to choose, and how to optimise performance on multicore CPUs.
Via TensorFlow (or Theano, or CNTK), Keras is able to run seamlessly on both CPUs and GPUs.
Multithreading : The ability of a central processing unit (CPU) (or a single core in a multi-core processor) to provide multiple threads of execution concurrently, supported by the operating system [3]. Multiprocessing: The use of two or more CPUs within a single computer system [4][5].
Tensorflow automatically runs the computations on as many cores as are available on a single machine.
If you have a distributed cluster, be sure you follow the instructions at https://www.tensorflow.org/how_tos/distributed/ to configure the cluster. (e.g. create the tf.ClusterSpec correctly, etc.)
To help debug, you can use the log_device_placement
configuration options on the session to have Tensorflow print out where the computations are actually placed. (Note: this works for both GPUs as well as distributed Tensorflow.)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
Note that while Tensorflow's computation placement algorithm works fine for small computational graphs, you might be able to get better performance on large computational graphs by manually placing the computations in specific devices. (e.g. using with tf.device(...):
blocks.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With