Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow: How to use dataset from generator in Estimator

Trying to build simple model just to figure out how to deal with tf.data.Dataset.from_generator. I can not understand how to set output_shapes argument. I tried several combinations including not specifying it but still receive some errors due to shape mismatch of the tensors. The idea is just to yield two numpy arrays with SIZE = 10 and run linear regression with them. Here is the code:

SIZE = 10


def _generator():
    feats = np.random.normal(0, 1, SIZE)
    labels = np.random.normal(0, 1, SIZE)
    yield feats, labels


def input_func_gen():
    shapes = (SIZE, SIZE)
    dataset = tf.data.Dataset.from_generator(generator=_generator,
                                             output_types=(tf.float32, tf.float32),
                                             output_shapes=shapes)
    dataset = dataset.batch(10)
    dataset = dataset.repeat(20)
    iterator = dataset.make_one_shot_iterator()
    features_tensors, labels = iterator.get_next()
    features = {'x': features_tensors}
    return features, labels


def train():
    x_col = tf.feature_column.numeric_column(key='x', )
    es = tf.estimator.LinearRegressor(feature_columns=[x_col])
    es = es.train(input_fn=input_func_gen)

Another question is if it is possible to use this functionality to provide data for feature columns which are tf.feature_column.crossed_column? The overall goal is to use Dataset.from_generator functionality in batch training where data is loaded on chunks from a database in cases when data does not fit in memory. All opinions and examples are highly appreciated.

Thanks!

like image 354
Y. Boshev Avatar asked Feb 13 '18 14:02

Y. Boshev


People also ask

What does TF data dataset from_tensor_slices do?

With that knowledge, from_tensors makes a dataset where each input tensor is like a row of your dataset, and from_tensor_slices makes a dataset where each input tensor is column of your data; so in the latter case all tensors must be the same length, and the elements (rows) of the resulting dataset are tuples with one ...

How do you get the shape of a TF dataset?

To get the shape of a tensor, you can easily use the tf. shape() function. This method will help the user to return the shape of the given tensor.

What is prefetching in TensorFlow?

Prefetching. Prefetching overlaps the preprocessing and model execution of a training step. While the model is executing training step s , the input pipeline is reading the data for step s+1 . Doing so reduces the step time to the maximum (as opposed to the sum) of the training and the time it takes to extract the data ...


1 Answers

The optional output_shapes argument of tf.data.Dataset.from_generator() allows you to specify the shapes of the values yielded from your generator. There are two constraints on its type that define how it should be specified:

  • The output_shapes argument is a "nested structure" (e.g. a tuple, a tuple of tuples, a dict of tuples, etc.) that must match the structure of the value(s) yielded by your generator.

    In your program, _generator() contains the statement yield feats, labels. Therefore the "nested structure" is a tuple of two elements (one for each array).

  • Each component of the output_shapes structure should match the shape of the corresponding tensor. The shape of an array is always a tuple of dimensions. (The shape of a tf.Tensor is more general: see this Stack Overflow question for a discussion.) Let's look at the actual shape of feats:

    >>> SIZE = 10
    >>> feats = np.random.normal(0, 1, SIZE)
    >>> print feats.shape
    (10,)
    

Therefore the output_shapes argument should be a 2-element tuple, where each element is (SIZE,):

shapes = ((SIZE,), (SIZE,))
dataset = tf.data.Dataset.from_generator(generator=_generator,
                                         output_types=(tf.float32, tf.float32),
                                         output_shapes=shapes)

Finally, you will need to provide a little more information about shapes to the tf.feature_column.numeric_column() and tf.estimator.LinearRegressor() APIs:

x_col = tf.feature_column.numeric_column(key='x', shape=(SIZE,))
es = tf.estimator.LinearRegressor(feature_columns=[x_col],
                                  label_dimension=10)
like image 124
mrry Avatar answered Oct 19 '22 03:10

mrry