I am using tensorflow 2.0.0-beta1 and python 3.7
First consider the following piece of code where tensor.numpy() works correctly:
import tensorflow as tf
import numpy as np
np.save('data.npy',np.ones(1024))
def func(mystr):
return np.load(mystr.numpy())
mystring = tf.constant('data.npy')
print(func(mystring))
The above code works correctly and outputs [1. 1. 1. ... 1. 1. 1.]
.
Now consider the following code in which tensor.numpy() doesn't work.
import tensorflow as tf
import numpy as np
np.save('data.npy',np.ones(1024))
def func(mystr):
return np.load(mystr.numpy())
mystring = tf.constant('data.npy')
data = tf.data.Dataset.from_tensor_slices([mystring])
data.map(func,1)
The above code gives the following error AttributeError: 'Tensor' object has no attribute 'numpy'
I am unable to figure out why tensor.numpy() doesn't work in the case of tf.data.Dataset.map()
EDIT
The following paragraph clarifies my purpose:
I have a dataset folder which contains millions of data pair (image,time-series). The entire dataset wont fit into memory, so I am using the tf.data.Dataset.map(func). Inside the func() function I want to load a numpy file which contains the time series as well as load the image. For loading the image there are inbuilt functions in tensorflow like tf.io.read_file and tf.image.decode_jpeg that accept string tensor. But np.load() does not accept string tensor. Thats why I want to convert the string tensor into a standard python string.
From the .map() documentation:
irrespective of the context in which map_func is defined (eager vs. graph), tf.data traces the function and executes it as a graph.
To use Python code inside .map()
you have two options:
tf.py_function
, which allows you to write arbitrary Python code but will generally result in worse performance than 1). For example:
d = tf.data.Dataset.from_tensor_slices(['hello', 'world'])
# transform a byte string tensor to a byte numpy string and decode to python str
# upper case string using a Python function
def upper_case_fn(t):
return t.numpy().decode('utf-8').upper()
# use the python code in graph mode
d.map(lambda x: tf.py_function(func=upper_case_fn,
inp=[x], Tout=tf.string)) # ==> [ "HELLO", "WORLD" ]
I hope this is still useful.
The difference is that the first example is executed eagerly but that tf.data.Dataset
are inherently lazily evaluated (with good reason).
A dataset can be used to represent arbitrarily large (and even infinite) datasets so they are only evaluated inside a computation graph to enable data to be passed through in chunks.
This means that eagerly executed methods such as numpy()
are not available in a dataset pipeline.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With