Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

is there a simple way to use features from tf.data.Dataset.from_generator with a custom model_fn(Estimator) in tensorflow

I am using tensorflow dataset api for my training data, the input_fn and generator for tf.data.Dataset.from_generator api

def generator():
    ......
    yield { "x" : features }, label


def input_fn():
    ds = tf.data.Dataset.from_generator(generator, ......)
    ......
    feature, label = ds.make_one_shot_iterator().get_next()
    return feature, label

then I created a custom model_fn for my Estimator with some code like :

def model_fn(features, labels, mode, params):
    print(features)
    ......
    layer = network.create_full_connect(input_tensor=features["x"], 
    (or layer = tf.layers.dense(features["x"], 200, ......)
    ......

when training :

estimator.train(input_fn=input_fn)

however, the code doesn't work since the features parameter for function model_fn is something :

Tensor("IteratorGetNext:0", dtype=float32, device=/device:CPU:0)

code "features["x"]" will fail and tell me :

......"site-packages\tensorflow\python\ops\array_ops.py", line 504, in _SliceHelper end.append(s + 1) TypeError: must be str, not int

if I changes input_fn to :

input_fn = tf.estimator.inputs.numpy_input_fn(
  x={"x": np.array([[1,2,3,4,5,6]])},
  y=np.array([1]),

the code goes on because features now is a dict.

I have searched code for estimator and found it use some function such as

features, labels = self._get_features_and_labels_from_input_fn(
      input_fn, model_fn_lib.ModeKeys.TRAIN)

to retrieve features and label from input_fn, but I have no idea about why it passes me(model_fn) two different data type of features by using different dataset implements, if I want to use my generator mode, then how to use that type (IteratorGetNext) of features ?

thanks for any help!

[UPDATED]

I had made some change to code,

def generator():
    ......
    yield features, label

def input_fn():
    ds = tf.data.Dataset.from_generator(generator, ......)
    ......
    feature, label = ds.make_one_shot_iterator().get_next()
    return {"x": feature}, label

however, still failed at tf.layers.dense, now it said

"Input 0 of layer dense_1 is incompatible with the layer: its rank is undefined, but the layer requires a defined rank."

although the features is a dict :

'x': tf.Tensor 'IteratorGetNext:0' shape=unknown dtype=float64

in the correct case, it is something :

'x': tf.Tensor 'random_shuffle_queue_DequeueMany:1' shape=(128, 6) dtype=float64

I learned similar usage from

https://developers.googleblog.com/2017/09/introducing-tensorflow-datasets.html

def my_input_fn(file_path, perform_shuffle=False, repeat_count=1):
   def decode_csv(line):
      ......
      d = dict(zip(feature_names, features)), label
      return d

   dataset = (tf.data.TextLineDataset(file_path)

but there is no official example for the generator case which returns an iterator to a custom model_fn.

like image 582
raywang Avatar asked Nov 20 '17 11:11

raywang


1 Answers

According to the examples on how to use from_generator, the generator returns the values to put in the dataset, not a dict of features. Instead, you build the dict in the input_fn.

Altering the code as follows should make it work:

def generator():
    ......
    yield features, label

def input_fn():
    ds = tf.data.Dataset.from_generator(generator, ......)
    ......
    feature, label = ds.make_one_shot_iterator().get_next()
    return {"x": feature}, label

Replying to the update:

Your code fails because the tensor generated by the iterator of a Dataset.from_generator doesn't have a static shape defined (since the generator could, in principle, return data with different shapes). Assuming your data has indeed always the same shape, you can call feature.set_shape(<the_shape_of_your_data>) before returning from input_fn (See the edit blow for the proper way to do this).

Edit:

As you pointed out in the comment, tf.data.Dataset.from_generator() has a third parameter which sets the shape of the output tensor, so instead of feature.set_shape() just pass the shape as output_shapes in from_generator().

like image 138
GPhilo Avatar answered Nov 15 '22 22:11

GPhilo