In the CIFAR-10 TensorFlow tutorial I encountered the following line:
images, label_batch = tf.train.batch(
[image, label],
batch_size=batch_size,
num_threads=num_preprocess_threads,
capacity=min_queue_examples + 3 * batch_size)
The function tf.train.batch()
seems to be taking as an input only one image and one label. How does it then create a batch with multiple images?
In memory data For any small CSV dataset the simplest way to train a TensorFlow model on it is to load it into memory as a pandas Dataframe or a NumPy array. A relatively simple example is the abalone dataset. The dataset is small. All the input features are all limited-range floating point values.
Feeding: Python code provides the data when running each step. Reading from files: an input pipeline reads the data from files at the beginning of a TensorFlow graph. Preloaded data: a constant or variable in the TensorFlow graph holds all the data (for small data sets).
Create a tensor of n-dimension Each tensor is displayed by the tensor name. Each tensor object is defined with tensor attributes like a unique label (name), a dimension (shape) and TensorFlow data types (dtype). You can define a tensor with decimal values or with a string by changing the type of data.
It takes in input the pair [image, label]
that, yes, it's a single pair. tf.train.batch
, however, creates a queue internally. The num_threads
threads will accumulate the pairs into the queue until capacity
is reached.
The images, label_batch
are, in fact, dequeue operations.
Remember that you're defining a computational graph, therefore the pair [image, label]
represent two nodes of the graph and the the various real pair of image, label
of your training set, will flow through these nodes. In this way, the tf.train.batch
can capture the flow of the images and labels and fill the queue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With