What is the difference between tensorflow on spark with the default distributed tensorflow 1.0?

Tags:

I am trying to install tensorflow on spark onto the server, as I was told by my boss because he thought it would be easy to use. But I also learnt the default distributed tensorflow on the tensorflow website. Can any expert tell me the difference between these two choice of distribution? Will spark automatically assign the parameter server or workers?

Thanks in advance.

550

asked May 23 '17 01:05

Jeff Wang

1 Answers

I finally installed TensorflowOnSpark(TFOS) on the server and compared it with the default distributed Tensorflow(TF). And my conclusion is:

Pros:

TFOS is more automatic. I don’t need to define which node in the cluster as the PS node. I also don’t need to upload the same code to all the nodes.
I don’t need to input the command line on each node to start the training.
The code change for running on TFOS is not much.

Cons:

Sometime, two worker nodes will be automatically assigned to the same GPU and core (K80 with two cores). And it will cause out of memory problem.
You need input a long list of configuration on command line before running.
You cannot specify which node to be PS node.

If I am wrong somewhere, please correct me.

191

answered Sep 28 '22 12:09

Jeff Wang

Related questions
                            
                                Using TestHiveContext/HiveContext in unit tests
                            
                                Locally change the log level for the zookeeper C client
                            
                                Spark mapWithState shuffles all data to one node
                            
                                How to give predicted and label columns in BinaryClassificationMetrics evaluation for Naive Bayes model
                            
                                Not able to fetch result from hive transaction enabled table through spark-sql
                            
                                How to write dataframe (obtained from hive table) into hadoop SequenceFile and RCFile?
                            
                                How to convert RDD to DataFrame in Spark Streaming, not just Spark
                            
                                Apache Toree and Spark Scala Not Working in Jupyter
                            
                                Spark never finishes jobs and stages, JobProgressListener crash
                            
                                The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx--------- (on Linux)
                            
                                How to implement a ScalaTest FunSuite to avoid boilerplate Spark code and import implicits
                            
                                Accessing Spark Mllib Bisecting K-means tree data
                            
                                Am I fully utilizing my EMR cluster?
                            
                                How to log malformed rows from Scala Spark DataFrameReader csv
                            
                                How to transform Dataset<Tuple2<String,DeviceData>> to Iterator<DeviceData>
                            
                                Naive install of PySpark to also support S3 access
                            
                                Broadcast a user defined class in Spark
                            
                                Do not discard keys with null values when converting to JSON in PySpark DataFrame
                            
                                Running Python startup code after modules are loaded
                            
                                How to use PySpark to load a rolling window from daily files?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the difference between tensorflow on spark with the default distributed tensorflow 1.0?

Tags:

tensorflow

deep-learning

apache-spark

distributed

Jeff Wang

People also ask

1 Answers

Jeff Wang

Recent Activity

Donate For Us