Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TensorFlow Training

Assuming I have a very simple neural network, like multilayer perceptron. For each layer the activation function is sigmoid and the network are fully connected.

In TensorFlow this might be defined like this:

    sess = tf.InteractiveSession()      # Training Tensor     x = tf.placeholder(tf.float32, shape = [None, n_fft])     # Label Tensor     y_ = tf.placeholder(tf.float32, shape = [None, n_fft])      # Declaring variable buffer for weights W and bias b     # Layer structure [n_fft, n_fft, n_fft, n_fft]     # Input -> Layer 1     struct_w = [n_fft, n_fft]     struct_b = [n_fft]     W1 = weight_variable(struct_w, 'W1')     b1 = bias_variable(struct_b, 'b1')     h1 = tf.nn.sigmoid(tf.matmul(x, W1) + b1)      # Layer1 -> Layer 2     W2 = weight_variable(struct_w, 'W2')     b2 = bias_variable(struct_b, 'b2')     h2 = tf.nn.sigmoid(tf.matmul(h1, W2) + b2)      # Layer2 -> output     W3 = weight_variable(struct_w, 'W3')     b3 = bias_variable(struct_b, 'b3')     y = tf.nn.sigmoid(tf.matmul(h2, W3) + b3)      # Calculating difference between label and output using mean square error     mse = tf.reduce_mean(tf.square(y - y_))      # Train the Model     # Gradient Descent     train_step = tf.train.GradientDescentOptimizer(0.3).minimize(mse) 

The design target for this model is to map a n_fft points fft spectrogram to another n_fft target spectrogram. Let's assume both the training data and target data are of size [3000, n_fft]. They are stored in variables spec_train and spec_target.

Now here comes the question. For TensorFlow is there any difference between these two trainings?

Training 1:

for i in xrange(200):         train_step.run(feed_dict = {x: spec_train, y_: spec_target}) 

Training 2:

for i in xrange(200):         for j in xrange(3000):             train = spec_train[j, :].reshape(1, n_fft)             label = spec_target[j, :].reshape(1, n_fft)             train_step.run(feed_dict = {x: train, y_: label}) 

Thank you very much!

like image 785
yc2986 Avatar asked Dec 04 '15 21:12

yc2986


People also ask

Is TensorFlow easy to learn?

TensorFlow makes it easy for beginners and experts to create machine learning models for desktop, mobile, web, and cloud. See the sections below to get started.

How long does it take to learn TensorFlow?

How Long Does it Take to Learn TensorFlow? If you already know Python programming and the theoretical foundations of neural networks, you can become a productive TensorFlow developer in 1 to 2 months. If you are a complete beginner in machine learning and programming, 3-6 months is a more realistic timeline.

How long does it take to train a TensorFlow model?

Training usually takes between 2-8 hours depending on the number of files and queued models for training.


1 Answers

In the first training version, you are training the entire batch of training data at once, which means that the first and the 3000th element of spec_train will be processed using the same model parameters in a single step. This is known as (Batch) Gradient Descent.

In the second training version, you are training a single example from the training data at once, which means that the 3000th element of spec_train will be processed using model parameters that have been updated 2999 times since the first element was most recently processed. This is known as Stochastic Gradient Descent (or it would be if the element was selected at random).

In general, TensorFlow is used with datasets that are too large to process in one batch, so mini-batch SGD (where a subset of the examples are processed in one step) is favored. Processing a single element at a time is theoretically desirable, but is inherently sequential and has high fixed costs because the matrix multiplications and other operations are not as computationally dense. Therefore, processing a small batch (e.g. 32 or 128) of examples at once is the usual approach, with multiple replicas training on different batches in parallel.

See this Stats StackExchange question for a more theoretical discussion of when you should use one approach versus the other.

like image 129
mrry Avatar answered Sep 22 '22 19:09

mrry