About the tf.contrib.data.Dataset
(from TensorFlow 1.2, see here and here) usage:
The way how to get data doesn't really fit any way how I get the data usually. In my case, I have a thread and I receive data there and I don't know in advance when it will end but I see when it ends. Then I wait until I processed all the buffers and then I have finished one epoch. How can I get this logic with the Dataset
?
Note that I prefer the Dataset
interface over the QueueBase
interface because it gives me the iterator interface which I can reinitialize and even reset to a different Dataset
. This is more powerful compared to queues which cannot be reopened currently after they are closed (see here and here).
Maybe a similar question, or the same question: How can I wrap around a Dataset
over a queue? I have some thread with reads some data from somewhere and which can feed it and queue it somehow. How do I get the data into the Dataset
? I could repeat some dummy tensor infinite times and then use map
to just return my queue.dequeue()
but that really only gets me back to all the original problems with the queue, i.e. how to reopen the queue.
The new Dataset.from_generator()
method allows you to define a Dataset
that is fed by a Python generator. (To use this feature at present, you must download a nightly build of TensorFlow or build it yourself from source. It will be part of TensorFlow 1.4.)
The easiest way to implement your example would be to replace your receiving thread with a generator, with pseudocode as follows:
def receiver():
while True:
next_element = ... # Receive next element from external source.
# Note that this method may block.
end_of_epoch = ... # Decide whether or not to stop based on next_element.
if not end_of_epoch:
yield next_element # Note: you may need to convert this to an array.
else:
return # Returning will signal OutOfRangeError on downstream iterators.
dataset = tf.contrib.data.Dataset.from_generator(receiver, output_types=...)
# You can chain other `Dataset` methods after the generator. For example:
dataset = dataset.prefetch(...) # This will start a background thread
# to prefetch elements from `receiver()`.
dataset = dataset.repeat(...) # Note that each repetition will call
# `receiver()` again, and start from
# a fresh state.
dataset = dataset.batch(...)
More complicated topologies are possible. For example, you can use Dataset.interleave()
to create many receivers in parallel.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With