I am trying to implement distributed execution in my Tensorflow code. I created a simple example. When I run it, the program does not yield any result. My guess is the host locations are not set properly for my Linux system.
import tensorflow as tf
cluster = tf.train.ClusterSpec({"local": ["localhost:2222", "localhost:2223"]})
x = tf.constant(2)
with tf.device("/job:local/task:1"):
y2 = x - 66
with tf.device("/job:local/task:0"):
y1 = x + 300
y = y1 + y2
with tf.Session("grpc://localhost:2222") as sess:
result = sess.run(y)
print(result)
Before running the session above, it is required to start 2 workers with another script (python tfserver.py 0 & python tfserver.py 1). Additionally I had to replace localhost with the actual server name due to some restrictions in the cluster.
# Get task number from command line
import sys
task_number = int(sys.argv[1])
import tensorflow as tf
cluster = tf.train.ClusterSpec({"local": ["localhost:2222", "localhost:2223"]})
server = tf.train.Server(cluster, job_name="local", task_index=task_number)
print("Starting server #{}".format(task_number))
server.start()
server.join()
Source: https://databricks.com/tensorflow/distributed-computing-with-tensorflow
More advanced usage here: https://github.com/tensorflow/examples/blob/master/community/en/docs/deploy/distributed.md
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With