Training and Predicting with instance keys

Tags:

I am able to train my model and use ML Engine for prediction but my results don't include any identifying information. This works fine when submitting one row at a time for prediction but when submitting multiple rows I have no way of connecting the prediction back to the original input data. The GCP documentation discusses using instance keys but I can't find any example code that trains and predicts using an instance key. Taking the GCP census example how would I update the input functions to pass a unique ID through the graph and ignore it during training yet return the unique ID with predictions? Or alternatively if anyone knows of a different example already using keys that would help as well.

From Census Estimator Sample

def serving_input_fn():
    feature_placeholders = {
      column.name: tf.placeholder(column.dtype, [None])
      for column in INPUT_COLUMNS
    }

    features = {
      key: tf.expand_dims(tensor, -1)
      for key, tensor in feature_placeholders.items()
    }

    return input_fn_utils.InputFnOps(
      features,
      None,
      feature_placeholders
    )


def generate_input_fn(filenames,
                  num_epochs=None,
                  shuffle=True,
                  skip_header_lines=0,
                  batch_size=40):

    def _input_fn():
        files = tf.concat([
          tf.train.match_filenames_once(filename)
          for filename in filenames
        ], axis=0)

        filename_queue = tf.train.string_input_producer(
          files, num_epochs=num_epochs, shuffle=shuffle)
        reader = tf.TextLineReader(skip_header_lines=skip_header_lines)

        _, rows = reader.read_up_to(filename_queue, num_records=batch_size)

        row_columns = tf.expand_dims(rows, -1)
        columns = tf.decode_csv(row_columns, record_defaults=CSV_COLUMN_DEFAULTS)
        features = dict(zip(CSV_COLUMNS, columns))

        # Remove unused columns
        for col in UNUSED_COLUMNS:
          features.pop(col)

        if shuffle:
           features = tf.train.shuffle_batch(
             features,
             batch_size,
             capacity=batch_size * 10,
             min_after_dequeue=batch_size*2 + 1,
             num_threads=multiprocessing.cpu_count(),
             enqueue_many=True,
             allow_smaller_final_batch=True
           )
        label_tensor = parse_label_column(features.pop(LABEL_COLUMN))
        return features, label_tensor

    return _input_fn

Update: I was able to use the suggested code from this answer below I just needed to alter it slightly to update the output alternatives in the model_fn_ops instead of just the prediction dict. However, this only works if my serving input function is coded for json inputs similar to this. My serving input function was previously modeled after the CSV serving input function in the Census Core Sample.

I think my problem is coming from the build_standardized_signature_def function and even more so the is_classification_problem function that it calls. The input dict length using the csv serving function is 1 so this logic ends up using the classification_signature_def which only ends up displaying the scores (which turns out are actually the probabilities) whereas the input dict length is greater than 1 with the json serving input function and instead the predict_signature_def is used which includes all of the outputs.

588

asked Jun 06 '17 05:06

dobbysock1002

1 Answers

UPDATE: In version 1.3 the contrib estimators (tf.contrib.learn.DNNClassifier for example), were changed to inherit from the core estimator class tf.estimator.Estimator which unlike it's predecessor, hides the model function as a private class member, so you'll need to replace estimator.model_fn in the solution below with estimator._model_fn.

Josh's answer points you to the Flowers example, which is a good solution if you want to use a custom estimator. If you want to stick with a canned estimator, (e.g. the tf.contrib.learn.DNNClassifiers) you can wrap it in a custom estimator that adds support for keys. (Note: I think it's likely canned estimators will gain key support when they move into core).

KEY = 'key'
def key_model_fn_gen(estimator):
    def _model_fn(features, labels, mode, params):
        key = features.pop(KEY, None)
        model_fn_ops = estimator.model_fn(
           features=features, labels=labels, mode=mode, params=params)
        if key:
            model_fn_ops.predictions[KEY] = key
            # This line makes it so the exported SavedModel will also require a key
            model_fn_ops.output_alternatives[None][1][KEY] = key
        return model_fn_ops
    return _model_fn

my_key_estimator = tf.contrib.learn.Estimator(
    model_fn=key_model_fn_gen(
        tf.contrib.learn.DNNClassifier(model_dir=model_dir...)
    ),
    model_dir=model_dir
)

my_key_estimator can then be used exactly like your DNNClassifier would be used, except it will expect a feature with the name 'key' from input_fns (prediction, evaluation and training).

EDIT2: You will also need to add the corresponding input tensor to the prediction input function of your choice. For example, a new JSON serving input fn would look like:

def json_serving_input_fn():
  inputs = # ... input_dict as before
  inputs[KEY] = tf.placeholder([None], dtype=tf.int64)
  features = # .. feature dict made from input_dict as before
  tf.contrib.learn.InputFnOps(features, None, inputs)

(slightly different between 1.2 and 1.3, as tf.contrib.learn.InputFnOps is replaced with tf.estimator.export.ServingInputReceiver, and padding tensors to rank 2 is no longer necessary in 1.3)

Then ML Engine will send a tensor named "key" with your prediction request, which will be passed to your model, and through with your predictions.

EDIT3: Modified key_model_fn_gen to support ignoring missing key values. EDIT4: Added key for prediction

168

answered Oct 08 '22 04:10

Eli Bixby

Related questions
                            
                                Tensorflow server: I don't want to initialize global variables for every session
                            
                                Accessing PyTorch GPU matrix from TensorFlow directly
                            
                                How can I speed up this Keras Attention computation?
                            
                                Why does TensorFlow always use GPU 0?
                            
                                Keras fit_generator() - How does batch for time series work?
                            
                                Tensorflow Windows Accessing Folders Denied:"NewRandomAccessFile failed to Create/Open: Access is denied. ; Input/output error"
                            
                                Training custom dataset with translate model
                            
                                Memory leak with TensorFlow
                            
                                Saving tf.trainable_variables() using convert_variables_to_constants
                            
                                No module named __future__
                            
                                How to keep tensorflow session open between predictions? Loading from SavedModel
                            
                                Tensorflow CNN training images are all different sizes
                            
                                cannot train Keras convolution network on GPU
                            
                                How can I add labels to TensorBoard Images?
                            
                                How do I set up TensorFlow in the Google cloud?
                            
                                using a `tf.Tensor` as a Python `bool` is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function
                            
                                Tensorflow Serving: When to use it rather than simple inference inside Flask service?
                            
                                TypeError: len is not well defined for symbolic Tensors. (activation_3/Identity:0) Please call `x.shape` rather than `len(x)` for shape information
                            
                                Tensorflow: save the model with smallest validation error
                            
                                Sorting an Array in TensorFlow

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Training and Predicting with instance keys

Tags:

tensorflow

google-cloud-ml-engine

dobbysock1002

People also ask

1 Answers

Eli Bixby

Recent Activity

Donate For Us