Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why dataset.output_shapes returns demension(none) after batching

Tags:

tensorflow

I'm using the Dataset API for input pipelines in TensorFlow (version: r1.2). I built my dataset and batched it with a batch size of 128. The dataset fed into the RNN.

Unfortunately, the dataset.output_shape returns dimension(none) in the first dimension, so the RNN raises an error:

Traceback (most recent call last):
  File "untitled1.py", line 188, in <module>
    tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
  File "/home/harold/anaconda2/envs/tensorflow_py2.7/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "untitled1.py", line 121, in main
    run_training()
  File "untitled1.py", line 57, in run_training
    is_training=True)
  File "/home/harold/huawei/ConvLSTM/ConvLSTM.py", line 216, in inference
    initial_state=initial_state)
  File "/home/harold/anaconda2/envs/tensorflow_py2.7/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 566, in dynamic_rnn
    dtype=dtype)
  File "/home/harold/anaconda2/envs/tensorflow_py2.7/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 636, in _dynamic_rnn_loop
    "Input size (depth of inputs) must be accessible via shape inference,"
ValueError: Input size (depth of inputs) must be accessible via shape inference, but saw value None.

I think this error is caused by the shape of input, the first dimension should be batch size but not none.

here is the code:

origin_dataset = Dataset.BetweenS_Dataset(FLAGS.data_path)
train_dataset = origin_dataset.train_dataset
test_dataset = origin_dataset.test_dataset
shuffle_train_dataset = train_dataset.shuffle(buffer_size=10000)
shuffle_batch_train_dataset = shuffle_train_dataset.batch(128)
batch_test_dataset = test_dataset.batch(FLAGS.batch_size)

iterator = tf.contrib.data.Iterator.from_structure(
                           shuffle_batch_train_dataset.output_types,
                            shuffle_batch_train_dataset.output_shapes)
(images, labels) = iterator.get_next()

training_init_op = iterator.make_initializer(shuffle_batch_train_dataset)
test_init_op = iterator.make_initializer(batch_test_dataset)

print(shuffle_batch_train_dataset.output_shapes)

I print output_shapes and it gives:

(TensorShape([Dimension(None), Dimension(36), Dimension(100)]), TensorShape([Dimension(None)]))

I suppose that it should be 128, because I have batched dataset:

(TensorShape([Dimension(128), Dimension(36), Dimension(100)]), TensorShape([Dimension(128)]))
like image 458
HaroldZ Avatar asked Jun 01 '17 05:06

HaroldZ


People also ask

Which datatype is returned by TensorFlow Datasets?

load will return the tuple ( tf. data. Dataset , tfds.

How many batches are in TensorFlow dataset?

By default, the batch size (batch_size) is 32.

What does TF data dataset From_tensor_slices do?

Dataset. from_tensor_slices() method, we can get the slices of an array in the form of objects by using tf. data.

How do I iterate over a TensorFlow dataset?

To iterate over the dataset several times, use . repeat() . We can enumerate each batch by using either Python's enumerator or a build-in method. The former produces a tensor, which is recommended.


1 Answers

This feature has been added with the drop_remainder parameter used like the following:

batch_test_dataset = test_dataset.batch(FLAGS.batch_size, drop_remainder=True)

From the docs:

drop_remainder: (Optional.) A tf.bool scalar tf.Tensor, representing whether the last batch should be dropped in the case its has fewer than batch_size elements; the default behavior is not to drop the smaller batch.

like image 152
McAngus Avatar answered Nov 15 '22 04:11

McAngus