Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between Dataset.from_tensors and Dataset.from_tensor_slices?

I have a dataset represented as a NumPy matrix of shape (num_features, num_examples) and I wish to convert it to TensorFlow type tf.Dataset.

I am struggling trying to understand the difference between these two methods: Dataset.from_tensors and Dataset.from_tensor_slices. What is the right one and why?

TensorFlow documentation (link) says that both method accept a nested structure of tensor although when using from_tensor_slices the tensor should have same size in the 0-th dimension.

like image 582
Llewlyn Avatar asked Mar 30 '18 18:03

Llewlyn


People also ask

What is TF data dataset?

TensorFlow Datasets is a collection of datasets ready to use, with TensorFlow or other Python ML frameworks, such as Jax. All datasets are exposed as tf. data. Datasets , enabling easy-to-use and high-performance input pipelines. To get started see the guide and our list of datasets.

What is a prefetch dataset?

Dataset. prefetch transformation. It can be used to decouple the time when data is produced from the time when data is consumed. In particular, the transformation uses a background thread and an internal buffer to prefetch elements from the input dataset ahead of the time they are requested.

How do I know what shape my TF dataset is?

To get the shape of a tensor, you can easily use the tf. shape() function. This method will help the user to return the shape of the given tensor.

What is TF data experimental Autotune?

tf. data builds a performance model of the input pipeline and runs an optimization algorithm to find a good allocation of its CPU budget across all parameters specified as AUTOTUNE .


2 Answers

from_tensors combines the input and returns a dataset with a single element:

>>> t = tf.constant([[1, 2], [3, 4]]) >>> ds = tf.data.Dataset.from_tensors(t) >>> [x for x in ds] [<tf.Tensor: shape=(2, 2), dtype=int32, numpy=  array([[1, 2],         [3, 4]], dtype=int32)>] 

from_tensor_slices creates a dataset with a separate element for each row of the input tensor:

>>> t = tf.constant([[1, 2], [3, 4]]) >>> ds = tf.data.Dataset.from_tensor_slices(t) >>> [x for x in ds] [<tf.Tensor: shape=(2,), dtype=int32, numpy=array([1, 2], dtype=int32)>,  <tf.Tensor: shape=(2,), dtype=int32, numpy=array([3, 4], dtype=int32)>] 
like image 108
MatthewScarpino Avatar answered Oct 14 '22 03:10

MatthewScarpino


1) Main difference between the two is that nested elements in from_tensor_slices must have the same dimension in 0th rank:

# exception: ValueError: Dimensions 10 and 9 are not compatible dataset1 = tf.data.Dataset.from_tensor_slices(     (tf.random_uniform([10, 4]), tf.random_uniform([9]))) # OK, first dimension is same dataset2 = tf.data.Dataset.from_tensors(     (tf.random_uniform([10, 4]), tf.random_uniform([10]))) 

2) The second difference, explained here, is when the input to a tf.Dataset is a list. For example:

dataset1 = tf.data.Dataset.from_tensor_slices(     [tf.random_uniform([2, 3]), tf.random_uniform([2, 3])])  dataset2 = tf.data.Dataset.from_tensors(     [tf.random_uniform([2, 3]), tf.random_uniform([2, 3])])  print(dataset1) # shapes: (2, 3) print(dataset2) # shapes: (2, 2, 3) 

In the above, from_tensors creates a 3D tensor while from_tensor_slices merge the input tensor. This can be handy if you have different sources of different image channels and want to concatenate them into a one RGB image tensor.

3) A mentioned in the previous answer, from_tensors convert the input tensor into one big tensor:

import tensorflow as tf  tf.enable_eager_execution()  dataset1 = tf.data.Dataset.from_tensor_slices(     (tf.random_uniform([4, 2]), tf.random_uniform([4])))  dataset2 = tf.data.Dataset.from_tensors(     (tf.random_uniform([4, 2]), tf.random_uniform([4])))  for i, item in enumerate(dataset1):     print('element: ' + str(i + 1), item[0], item[1])  print(30*'-')  for i, item in enumerate(dataset2):     print('element: ' + str(i + 1), item[0], item[1]) 

output:

element: 1 tf.Tensor(... shapes: ((2,), ())) element: 2 tf.Tensor(... shapes: ((2,), ())) element: 3 tf.Tensor(... shapes: ((2,), ())) element: 4 tf.Tensor(... shapes: ((2,), ())) ------------------------- element: 1 tf.Tensor(... shapes: ((4, 2), (4,))) 
like image 44
Amir Avatar answered Oct 14 '22 02:10

Amir