Assuming I have a very simple neural network, like multilayer perceptron. For each layer the activation function is sigmoid and the network are fully connected.
In TensorFlow this might be defined like this:
sess = tf.InteractiveSession() # Training Tensor x = tf.placeholder(tf.float32, shape = [None, n_fft]) # Label Tensor y_ = tf.placeholder(tf.float32, shape = [None, n_fft]) # Declaring variable buffer for weights W and bias b # Layer structure [n_fft, n_fft, n_fft, n_fft] # Input -> Layer 1 struct_w = [n_fft, n_fft] struct_b = [n_fft] W1 = weight_variable(struct_w, 'W1') b1 = bias_variable(struct_b, 'b1') h1 = tf.nn.sigmoid(tf.matmul(x, W1) + b1) # Layer1 -> Layer 2 W2 = weight_variable(struct_w, 'W2') b2 = bias_variable(struct_b, 'b2') h2 = tf.nn.sigmoid(tf.matmul(h1, W2) + b2) # Layer2 -> output W3 = weight_variable(struct_w, 'W3') b3 = bias_variable(struct_b, 'b3') y = tf.nn.sigmoid(tf.matmul(h2, W3) + b3) # Calculating difference between label and output using mean square error mse = tf.reduce_mean(tf.square(y - y_)) # Train the Model # Gradient Descent train_step = tf.train.GradientDescentOptimizer(0.3).minimize(mse)
The design target for this model is to map a n_fft
points fft spectrogram to another n_fft
target spectrogram. Let's assume both the training data and target data are of size [3000, n_fft]
. They are stored in variables spec_train
and spec_target
.
Now here comes the question. For TensorFlow is there any difference between these two trainings?
Training 1:
for i in xrange(200): train_step.run(feed_dict = {x: spec_train, y_: spec_target})
Training 2:
for i in xrange(200): for j in xrange(3000): train = spec_train[j, :].reshape(1, n_fft) label = spec_target[j, :].reshape(1, n_fft) train_step.run(feed_dict = {x: train, y_: label})
Thank you very much!
TensorFlow makes it easy for beginners and experts to create machine learning models for desktop, mobile, web, and cloud. See the sections below to get started.
How Long Does it Take to Learn TensorFlow? If you already know Python programming and the theoretical foundations of neural networks, you can become a productive TensorFlow developer in 1 to 2 months. If you are a complete beginner in machine learning and programming, 3-6 months is a more realistic timeline.
Training usually takes between 2-8 hours depending on the number of files and queued models for training.
In the first training version, you are training the entire batch of training data at once, which means that the first and the 3000th element of spec_train
will be processed using the same model parameters in a single step. This is known as (Batch) Gradient Descent.
In the second training version, you are training a single example from the training data at once, which means that the 3000th element of spec_train
will be processed using model parameters that have been updated 2999 times since the first element was most recently processed. This is known as Stochastic Gradient Descent (or it would be if the element was selected at random).
In general, TensorFlow is used with datasets that are too large to process in one batch, so mini-batch SGD (where a subset of the examples are processed in one step) is favored. Processing a single element at a time is theoretically desirable, but is inherently sequential and has high fixed costs because the matrix multiplications and other operations are not as computationally dense. Therefore, processing a small batch (e.g. 32 or 128) of examples at once is the usual approach, with multiple replicas training on different batches in parallel.
See this Stats StackExchange question for a more theoretical discussion of when you should use one approach versus the other.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With