I am using tensorflow dataset api for my training data, the input_fn and generator for tf.data.Dataset.from_generator api
def generator():
......
yield { "x" : features }, label
def input_fn():
ds = tf.data.Dataset.from_generator(generator, ......)
......
feature, label = ds.make_one_shot_iterator().get_next()
return feature, label
then I created a custom model_fn for my Estimator with some code like :
def model_fn(features, labels, mode, params):
print(features)
......
layer = network.create_full_connect(input_tensor=features["x"],
(or layer = tf.layers.dense(features["x"], 200, ......)
......
when training :
estimator.train(input_fn=input_fn)
however, the code doesn't work since the features parameter for function model_fn is something :
Tensor("IteratorGetNext:0", dtype=float32, device=/device:CPU:0)
code "features["x"]" will fail and tell me :
......"site-packages\tensorflow\python\ops\array_ops.py", line 504, in _SliceHelper end.append(s + 1) TypeError: must be str, not int
if I changes input_fn to :
input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": np.array([[1,2,3,4,5,6]])},
y=np.array([1]),
the code goes on because features now is a dict.
I have searched code for estimator and found it use some function such as
features, labels = self._get_features_and_labels_from_input_fn(
input_fn, model_fn_lib.ModeKeys.TRAIN)
to retrieve features and label from input_fn, but I have no idea about why it passes me(model_fn) two different data type of features by using different dataset implements, if I want to use my generator mode, then how to use that type (IteratorGetNext) of features ?
thanks for any help!
[UPDATED]
I had made some change to code,
def generator():
......
yield features, label
def input_fn():
ds = tf.data.Dataset.from_generator(generator, ......)
......
feature, label = ds.make_one_shot_iterator().get_next()
return {"x": feature}, label
however, still failed at tf.layers.dense, now it said
"Input 0 of layer dense_1 is incompatible with the layer: its rank is undefined, but the layer requires a defined rank."
although the features is a dict :
'x': tf.Tensor 'IteratorGetNext:0' shape=unknown dtype=float64
in the correct case, it is something :
'x': tf.Tensor 'random_shuffle_queue_DequeueMany:1' shape=(128, 6) dtype=float64
I learned similar usage from
https://developers.googleblog.com/2017/09/introducing-tensorflow-datasets.html
def my_input_fn(file_path, perform_shuffle=False, repeat_count=1):
def decode_csv(line):
......
d = dict(zip(feature_names, features)), label
return d
dataset = (tf.data.TextLineDataset(file_path)
but there is no official example for the generator case which returns an iterator to a custom model_fn.
According to the examples on how to use from_generator
, the generator returns the values to put in the dataset, not a dict of features. Instead, you build the dict in the input_fn
.
Altering the code as follows should make it work:
def generator():
......
yield features, label
def input_fn():
ds = tf.data.Dataset.from_generator(generator, ......)
......
feature, label = ds.make_one_shot_iterator().get_next()
return {"x": feature}, label
Your code fails because the tensor generated by the iterator of a Dataset.from_generator
doesn't have a static shape
defined (since the generator could, in principle, return data with different shapes).
Assuming your data has indeed always the same shape, you can call (See the edit blow for the proper way to do this).feature.set_shape(<the_shape_of_your_data>)
before return
ing from input_fn
As you pointed out in the comment, tf.data.Dataset.from_generator()
has a third parameter which sets the shape of the output tensor, so instead of feature.set_shape()
just pass the shape as output_shapes
in from_generator()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With