Trying to build simple model just to figure out how to deal with tf.data.Dataset.from_generator
. I can not understand how to set output_shapes
argument. I tried several combinations including not specifying it but still receive some errors due to shape mismatch of the tensors. The idea is just to yield two numpy arrays with SIZE = 10
and run linear regression with them. Here is the code:
SIZE = 10
def _generator():
feats = np.random.normal(0, 1, SIZE)
labels = np.random.normal(0, 1, SIZE)
yield feats, labels
def input_func_gen():
shapes = (SIZE, SIZE)
dataset = tf.data.Dataset.from_generator(generator=_generator,
output_types=(tf.float32, tf.float32),
output_shapes=shapes)
dataset = dataset.batch(10)
dataset = dataset.repeat(20)
iterator = dataset.make_one_shot_iterator()
features_tensors, labels = iterator.get_next()
features = {'x': features_tensors}
return features, labels
def train():
x_col = tf.feature_column.numeric_column(key='x', )
es = tf.estimator.LinearRegressor(feature_columns=[x_col])
es = es.train(input_fn=input_func_gen)
Another question is if it is possible to use this functionality to provide data for feature columns which are tf.feature_column.crossed_column
? The overall goal is to use Dataset.from_generator
functionality in batch training where data is loaded on chunks from a database in cases when data does not fit in memory. All opinions and examples are highly appreciated.
Thanks!
With that knowledge, from_tensors makes a dataset where each input tensor is like a row of your dataset, and from_tensor_slices makes a dataset where each input tensor is column of your data; so in the latter case all tensors must be the same length, and the elements (rows) of the resulting dataset are tuples with one ...
To get the shape of a tensor, you can easily use the tf. shape() function. This method will help the user to return the shape of the given tensor.
Prefetching. Prefetching overlaps the preprocessing and model execution of a training step. While the model is executing training step s , the input pipeline is reading the data for step s+1 . Doing so reduces the step time to the maximum (as opposed to the sum) of the training and the time it takes to extract the data ...
The optional output_shapes
argument of tf.data.Dataset.from_generator()
allows you to specify the shapes of the values yielded from your generator. There are two constraints on its type that define how it should be specified:
The output_shapes
argument is a "nested structure" (e.g. a tuple, a tuple of tuples, a dict of tuples, etc.) that must match the structure of the value(s) yielded by your generator.
In your program, _generator()
contains the statement yield feats, labels
. Therefore the "nested structure" is a tuple of two elements (one for each array).
Each component of the output_shapes
structure should match the shape of the corresponding tensor. The shape of an array is always a tuple of dimensions. (The shape of a tf.Tensor
is more general: see this Stack Overflow question for a discussion.) Let's look at the actual shape of feats
:
>>> SIZE = 10
>>> feats = np.random.normal(0, 1, SIZE)
>>> print feats.shape
(10,)
Therefore the output_shapes
argument should be a 2-element tuple, where each element is (SIZE,)
:
shapes = ((SIZE,), (SIZE,))
dataset = tf.data.Dataset.from_generator(generator=_generator,
output_types=(tf.float32, tf.float32),
output_shapes=shapes)
Finally, you will need to provide a little more information about shapes to the tf.feature_column.numeric_column()
and tf.estimator.LinearRegressor()
APIs:
x_col = tf.feature_column.numeric_column(key='x', shape=(SIZE,))
es = tf.estimator.LinearRegressor(feature_columns=[x_col],
label_dimension=10)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With