TensorFlow Training

Tags:

tensorflow

Assuming I have a very simple neural network, like multilayer perceptron. For each layer the activation function is sigmoid and the network are fully connected.

In TensorFlow this might be defined like this:

    sess = tf.InteractiveSession()      # Training Tensor     x = tf.placeholder(tf.float32, shape = [None, n_fft])     # Label Tensor     y_ = tf.placeholder(tf.float32, shape = [None, n_fft])      # Declaring variable buffer for weights W and bias b     # Layer structure [n_fft, n_fft, n_fft, n_fft]     # Input -> Layer 1     struct_w = [n_fft, n_fft]     struct_b = [n_fft]     W1 = weight_variable(struct_w, 'W1')     b1 = bias_variable(struct_b, 'b1')     h1 = tf.nn.sigmoid(tf.matmul(x, W1) + b1)      # Layer1 -> Layer 2     W2 = weight_variable(struct_w, 'W2')     b2 = bias_variable(struct_b, 'b2')     h2 = tf.nn.sigmoid(tf.matmul(h1, W2) + b2)      # Layer2 -> output     W3 = weight_variable(struct_w, 'W3')     b3 = bias_variable(struct_b, 'b3')     y = tf.nn.sigmoid(tf.matmul(h2, W3) + b3)      # Calculating difference between label and output using mean square error     mse = tf.reduce_mean(tf.square(y - y_))      # Train the Model     # Gradient Descent     train_step = tf.train.GradientDescentOptimizer(0.3).minimize(mse)

The design target for this model is to map a n_fft points fft spectrogram to another n_fft target spectrogram. Let's assume both the training data and target data are of size [3000, n_fft]. They are stored in variables spec_train and spec_target.

Now here comes the question. For TensorFlow is there any difference between these two trainings?

Training 1:

for i in xrange(200):         train_step.run(feed_dict = {x: spec_train, y_: spec_target})

Training 2:

for i in xrange(200):         for j in xrange(3000):             train = spec_train[j, :].reshape(1, n_fft)             label = spec_target[j, :].reshape(1, n_fft)             train_step.run(feed_dict = {x: train, y_: label})

Thank you very much!

785

asked Dec 04 '15 21:12

yc2986

1 Answers

In the first training version, you are training the entire batch of training data at once, which means that the first and the 3000th element of spec_train will be processed using the same model parameters in a single step. This is known as (Batch) Gradient Descent.

In the second training version, you are training a single example from the training data at once, which means that the 3000th element of spec_train will be processed using model parameters that have been updated 2999 times since the first element was most recently processed. This is known as Stochastic Gradient Descent (or it would be if the element was selected at random).

In general, TensorFlow is used with datasets that are too large to process in one batch, so mini-batch SGD (where a subset of the examples are processed in one step) is favored. Processing a single element at a time is theoretically desirable, but is inherently sequential and has high fixed costs because the matrix multiplications and other operations are not as computationally dense. Therefore, processing a small batch (e.g. 32 or 128) of examples at once is the usual approach, with multiple replicas training on different batches in parallel.

See this Stats StackExchange question for a more theoretical discussion of when you should use one approach versus the other.

129

answered Sep 22 '22 19:09

mrry

Related questions
                            
                                Multi-layer neural network won't predict negative values
                            
                                Loss suddenly increases with Adam Optimizer in Tensorflow
                            
                                How to train a neural network to supervised data set using pybrain black-box optimization?
                            
                                Neural Network / Machine Learning memory storage
                            
                                How do you create a boolean mask for a tensor in Keras?
                            
                                what does class_mode parameter in Keras image_gen.flow_from_directory() signify?
                            
                                What kind of artificial intelligence jobs are out there? [closed]
                            
                                Time Series Prediction via Neural Networks
                            
                                I get error "Error in nnet.default(x, y, w, ...) : too many (77031) weights" while training neural networks
                            
                                caffe with multi-label images
                            
                                Convolutional neural network Conv1d input shape
                            
                                Custom Neural Network Implementation on MNIST using Tensorflow 2.0?
                            
                                What is a loss function in simple words?
                            
                                What is a `"Python"` layer in caffe?
                            
                                How training and test data is split - Keras on Tensorflow
                            
                                ReLU derivative in backpropagation
                            
                                Using pre-trained word2vec with LSTM for word generation
                            
                                Choosing number of Steps per Epoch
                            
                                Why neural network predicts wrong on its own training data?
                            
                                Why is Tensorflow 100x slower than convnetjs in this simple NN example?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With