Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

tensor.numpy() not working in tensorflow.data.Dataset. Throws the error: AttributeError: 'Tensor' object has no attribute 'numpy'

I am using tensorflow 2.0.0-beta1 and python 3.7

First consider the following piece of code where tensor.numpy() works correctly:

import tensorflow as tf
import numpy as np

np.save('data.npy',np.ones(1024))

def func(mystr): 
    return np.load(mystr.numpy())

mystring = tf.constant('data.npy')
print(func(mystring))

The above code works correctly and outputs [1. 1. 1. ... 1. 1. 1.].

Now consider the following code in which tensor.numpy() doesn't work.

import tensorflow as tf
import numpy as np

np.save('data.npy',np.ones(1024))

def func(mystr):
    return np.load(mystr.numpy())

mystring = tf.constant('data.npy')
data = tf.data.Dataset.from_tensor_slices([mystring])
data.map(func,1)

The above code gives the following error AttributeError: 'Tensor' object has no attribute 'numpy'

I am unable to figure out why tensor.numpy() doesn't work in the case of tf.data.Dataset.map()

EDIT

The following paragraph clarifies my purpose:

I have a dataset folder which contains millions of data pair (image,time-series). The entire dataset wont fit into memory, so I am using the tf.data.Dataset.map(func). Inside the func() function I want to load a numpy file which contains the time series as well as load the image. For loading the image there are inbuilt functions in tensorflow like tf.io.read_file and tf.image.decode_jpeg that accept string tensor. But np.load() does not accept string tensor. Thats why I want to convert the string tensor into a standard python string.

like image 785
Aalok_G Avatar asked Jun 19 '19 10:06

Aalok_G


2 Answers

From the .map() documentation:

irrespective of the context in which map_func is defined (eager vs. graph), tf.data traces the function and executes it as a graph.

To use Python code inside .map() you have two options:

  1. Rely on AutoGraph to convert Python code into an equivalent graph computation. The downside of this approach is that AutoGraph can convert some but not all Python code.
  2. Use tf.py_function, which allows you to write arbitrary Python code but will generally result in worse performance than 1).

For example:

d = tf.data.Dataset.from_tensor_slices(['hello', 'world'])

#  transform a byte string tensor to a byte numpy string and decode to python str
#  upper case string using a Python function
def upper_case_fn(t):
    return t.numpy().decode('utf-8').upper()

#  use the python code in graph mode
d.map(lambda x: tf.py_function(func=upper_case_fn,
      inp=[x], Tout=tf.string))  # ==> [ "HELLO", "WORLD" ]

I hope this is still useful.

like image 92
Pietro Avatar answered Oct 06 '22 15:10

Pietro


The difference is that the first example is executed eagerly but that tf.data.Dataset are inherently lazily evaluated (with good reason).

A dataset can be used to represent arbitrarily large (and even infinite) datasets so they are only evaluated inside a computation graph to enable data to be passed through in chunks.

This means that eagerly executed methods such as numpy() are not available in a dataset pipeline.

like image 33
Stewart_R Avatar answered Oct 06 '22 15:10

Stewart_R