I have a dataset represented as a NumPy matrix of shape <code>(num_features, num_examples)</code> and I wish to convert it to TensorFlow type <code>tf.Dataset</code>. I am struggling trying to understand the difference between these two methods: <code>Dataset.from_tensors</code> and <code>Dataset.from_tensor_slices</code>. What is the right one and why? TensorFlow documentation (link) says that both method accept a nested structure of tensor although when using <code>from_tensor_slices</code> the tensor should have same size in the 0-th dimension.

<code>from_tensors</code> combines the input and returns a dataset with a single element: <pre class="prettyprint"><code>>>> t = tf.constant([[1, 2], [3, 4]]) >>> ds = tf.data.Dataset.from_tensors(t) >>> [x for x in ds] [<tf.Tensor: shape=(2, 2), dtype=int32, numpy= array([[1, 2], [3, 4]], dtype=int32)>] </code></pre> <code>from_tensor_slices</code> creates a dataset with a separate element for each row of the input tensor: <pre class="prettyprint"><code>>>> t = tf.constant([[1, 2], [3, 4]]) >>> ds = tf.data.Dataset.from_tensor_slices(t) >>> [x for x in ds] [<tf.Tensor: shape=(2,), dtype=int32, numpy=array([1, 2], dtype=int32)>, <tf.Tensor: shape=(2,), dtype=int32, numpy=array([3, 4], dtype=int32)>] </code></pre>

1) Main difference between the two is that nested elements in <code>from_tensor_slices</code> must have the same dimension in 0th rank: <pre class="prettyprint"><code># exception: ValueError: Dimensions 10 and 9 are not compatible dataset1 = tf.data.Dataset.from_tensor_slices( (tf.random_uniform([10, 4]), tf.random_uniform([9]))) # OK, first dimension is same dataset2 = tf.data.Dataset.from_tensors( (tf.random_uniform([10, 4]), tf.random_uniform([10]))) </code></pre> 2) The second difference, explained here, is when the input to a tf.Dataset is a list. For example: <pre class="prettyprint"><code>dataset1 = tf.data.Dataset.from_tensor_slices( [tf.random_uniform([2, 3]), tf.random_uniform([2, 3])]) dataset2 = tf.data.Dataset.from_tensors( [tf.random_uniform([2, 3]), tf.random_uniform([2, 3])]) print(dataset1) # shapes: (2, 3) print(dataset2) # shapes: (2, 2, 3) </code></pre> In the above, <code>from_tensors</code> creates a 3D tensor while <code>from_tensor_slices</code> merge the input tensor. This can be handy if you have different sources of different image channels and want to concatenate them into a one RGB image tensor. 3) A mentioned in the previous answer, <code>from_tensors</code> convert the input tensor into one big tensor: <pre class="prettyprint"><code>import tensorflow as tf tf.enable_eager_execution() dataset1 = tf.data.Dataset.from_tensor_slices( (tf.random_uniform([4, 2]), tf.random_uniform([4]))) dataset2 = tf.data.Dataset.from_tensors( (tf.random_uniform([4, 2]), tf.random_uniform([4]))) for i, item in enumerate(dataset1): print('element: ' + str(i + 1), item[0], item[1]) print(30*'-') for i, item in enumerate(dataset2): print('element: ' + str(i + 1), item[0], item[1]) </code></pre> output: <pre class="prettyprint"><code>element: 1 tf.Tensor(... shapes: ((2,), ())) element: 2 tf.Tensor(... shapes: ((2,), ())) element: 3 tf.Tensor(... shapes: ((2,), ())) element: 4 tf.Tensor(... shapes: ((2,), ())) ------------------------- element: 1 tf.Tensor(... shapes: ((4, 2), (4,))) </code></pre>

What is the difference between Dataset.from_tensors and Dataset.from_tensor_slices?

2 Answers

from_tensors combines the input and returns a dataset with a single element:

>>> t = tf.constant([[1, 2], [3, 4]]) >>> ds = tf.data.Dataset.from_tensors(t) >>> [x for x in ds] [<tf.Tensor: shape=(2, 2), dtype=int32, numpy=  array([[1, 2],         [3, 4]], dtype=int32)>]

from_tensor_slices creates a dataset with a separate element for each row of the input tensor:

>>> t = tf.constant([[1, 2], [3, 4]]) >>> ds = tf.data.Dataset.from_tensor_slices(t) >>> [x for x in ds] [<tf.Tensor: shape=(2,), dtype=int32, numpy=array([1, 2], dtype=int32)>,  <tf.Tensor: shape=(2,), dtype=int32, numpy=array([3, 4], dtype=int32)>]

108

answered Oct 14 '22 03:10

MatthewScarpino

1) Main difference between the two is that nested elements in from_tensor_slices must have the same dimension in 0th rank:

# exception: ValueError: Dimensions 10 and 9 are not compatible dataset1 = tf.data.Dataset.from_tensor_slices(     (tf.random_uniform([10, 4]), tf.random_uniform([9]))) # OK, first dimension is same dataset2 = tf.data.Dataset.from_tensors(     (tf.random_uniform([10, 4]), tf.random_uniform([10])))

2) The second difference, explained here, is when the input to a tf.Dataset is a list. For example:

dataset1 = tf.data.Dataset.from_tensor_slices(     [tf.random_uniform([2, 3]), tf.random_uniform([2, 3])])  dataset2 = tf.data.Dataset.from_tensors(     [tf.random_uniform([2, 3]), tf.random_uniform([2, 3])])  print(dataset1) # shapes: (2, 3) print(dataset2) # shapes: (2, 2, 3)

In the above, from_tensors creates a 3D tensor while from_tensor_slices merge the input tensor. This can be handy if you have different sources of different image channels and want to concatenate them into a one RGB image tensor.

3) A mentioned in the previous answer, from_tensors convert the input tensor into one big tensor:

import tensorflow as tf  tf.enable_eager_execution()  dataset1 = tf.data.Dataset.from_tensor_slices(     (tf.random_uniform([4, 2]), tf.random_uniform([4])))  dataset2 = tf.data.Dataset.from_tensors(     (tf.random_uniform([4, 2]), tf.random_uniform([4])))  for i, item in enumerate(dataset1):     print('element: ' + str(i + 1), item[0], item[1])  print(30*'-')  for i, item in enumerate(dataset2):     print('element: ' + str(i + 1), item[0], item[1])

output:

element: 1 tf.Tensor(... shapes: ((2,), ())) element: 2 tf.Tensor(... shapes: ((2,), ())) element: 3 tf.Tensor(... shapes: ((2,), ())) element: 4 tf.Tensor(... shapes: ((2,), ())) ------------------------- element: 1 tf.Tensor(... shapes: ((4, 2), (4,)))

answered Oct 14 '22 02:10

Amir

Related questions
                            
                                Cannot use Requests-Module on AWS Lambda
                            
                                How do I split a custom dataset into training and test datasets?
                            
                                Text processing - Python vs Perl performance [closed]
                            
                                Type annotations for Enum attribute
                            
                                Finding dead code in large python project [closed]
                            
                                Why is numpy's einsum faster than numpy's built in functions?
                            
                                How do I access my webcam in Python?
                            
                                Where is the __builtin__ module in Python3? Why was it renamed?
                            
                                Python multiprocessing.Pool: AttributeError
                            
                                SQLAlchemy, get object not bound to a Session
                            
                                How to write CSV output to stdout?
                            
                                Iterating over arbitrary dimension of numpy.array
                            
                                zeromq: how to prevent infinite wait?
                            
                                DRY way to add created/modified by and time
                            
                                pip no longer working after update error 'module' object is not callable
                            
                                Is it possible to insert a row at an arbitrary position in a dataframe using pandas?
                            
                                Finding common rows (intersection) in two Pandas dataframes
                            
                                Easiest way to rm -rf in Python
                            
                                print python stack trace without exception being raised
                            
                                Unit tests for functions in a Jupyter notebook?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the difference between Dataset.from_tensors and Dataset.from_tensor_slices?

Tags:

python

tensorflow

tensorflow-datasets

Llewlyn

People also ask

2 Answers

MatthewScarpino

Amir

Recent Activity

Donate For Us