TensorFlow Inference

Tags:

I've been digging around on this for a while. I have found a ton of articles; but none really show just tensorflow inference as a plain inference. Its always "use the serving engine" or using a graph that is pre-coded/defined.

Here is the problem: I have a device which occasionally checks for updated models. It then needs to load that model and run input predictions through the model.

In keras this was simple: build a model; train the model and the call model.predict(). In scikit-learn same thing.

I am able to grab a new model and load it; I can print out all of the weights; but how in the world do I run inference against it?

Code to load model and print weights:

Click to copy

    with tf.Session() as sess:         new_saver = tf.train.import_meta_graph(MODEL_PATH + '.meta', clear_devices=True)         new_saver.restore(sess, MODEL_PATH)         for var in tf.trainable_variables():             print(sess.run(var))

I printed out all of my collections and I have: ['queue_runners', 'variables', 'losses', 'summaries', 'train_op', 'cond_context', 'trainable_variables']

I tried using sess.run(train_op); however that just started kicking up a full training session; which is not what I want to do. I just want to run inference against a different set of inputs that I provide which are not TF Records.

Just a little more detail:

The device can use C++ or Python; as long as I can produce a .exe. I can set up a feed dict if I want to feed the system. I trained with TFRecords; but in production I'm not going to use TFRecords; its a real/near real time system.

Thanks for any input. I am posting sample code to this repo: https://github.com/drcrook1/CIFAR10/TensorFlow which does all the training and sample inference.

Any hints are greatly appreciated!

------------EDITS----------------- I rebuilt the model to be as below:

Click to copy

def inference(images):     '''     Portion of the compute graph that takes an input and converts it into a Y output     '''     with tf.variable_scope('Conv1') as scope:         C_1_1 = ld.cnn_layer(images, (5, 5, 3, 32), (1, 1, 1, 1), scope, name_postfix='1')         C_1_2 = ld.cnn_layer(C_1_1, (5, 5, 32, 32), (1, 1, 1, 1), scope, name_postfix='2')         P_1 = ld.pool_layer(C_1_2, (1, 2, 2, 1), (1, 2, 2, 1), scope)     with tf.variable_scope('Dense1') as scope:         P_1 = tf.reshape(C_1_2, (CONSTANTS.BATCH_SIZE, -1))         dim = P_1.get_shape()[1].value         D_1 = ld.mlp_layer(P_1, dim, NUM_DENSE_NEURONS, scope, act_func=tf.nn.relu)     with tf.variable_scope('Dense2') as scope:         D_2 = ld.mlp_layer(D_1, NUM_DENSE_NEURONS, CONSTANTS.NUM_CLASSES, scope)     H = tf.nn.softmax(D_2, name='prediction')     return H

notice I add the name 'prediction' to the TF operation so I can retrieve it later.

When training I used the input pipeline for tfrecords and input queues.

Click to copy

GRAPH = tf.Graph() with GRAPH.as_default():     examples, labels = Inputs.read_inputs(CONSTANTS.RecordPaths,                                           batch_size=CONSTANTS.BATCH_SIZE,                                           img_shape=CONSTANTS.IMAGE_SHAPE,                                           num_threads=CONSTANTS.INPUT_PIPELINE_THREADS)     examples = tf.reshape(examples, [CONSTANTS.BATCH_SIZE, CONSTANTS.IMAGE_SHAPE[0],                                      CONSTANTS.IMAGE_SHAPE[1], CONSTANTS.IMAGE_SHAPE[2]])     logits = Vgg3CIFAR10.inference(examples)     loss = Vgg3CIFAR10.loss(logits, labels)     OPTIMIZER = tf.train.AdamOptimizer(CONSTANTS.LEARNING_RATE)

I am attempting to use feed_dict on the loaded operation in the graph; however now it is just simply hanging....

Click to copy

MODEL_PATH = 'models/' + CONSTANTS.MODEL_NAME + '.model'  images = tf.placeholder(tf.float32, shape=(1, 32, 32, 3))  def run_inference():     '''Runs inference against a loaded model'''     with tf.Session() as sess:         #sess.run(tf.global_variables_initializer())         new_saver = tf.train.import_meta_graph(MODEL_PATH + '.meta', clear_devices=True)         new_saver.restore(sess, MODEL_PATH)         pred = tf.get_default_graph().get_operation_by_name('prediction')         rand = np.random.rand(1, 32, 32, 3)         print(rand)         print(pred)         print(sess.run(pred, feed_dict={images: rand}))         print('done')  run_inference()

I believe this is not working because the original network was trained using TFRecords. In the sample CIFAR data set the data is small; our real data set is huge and it is my understanding TFRecords the the default best practice for training a network. The feed_dict makes great perfect sense from a productionizing perspective; we can spin up some threads and populate that thing from our input systems.

So I guess I have a network that is trained, I can get the predict operation; but how do I tell it to stop using the input queues and start using the feed_dict? Remember that from the production perspective I do not have access to whatever the scientists did to make it. They do their thing; and we stick it in production using whatever agreed upon standard.

-------INPUT OPS--------

tf.Operation 'input/input_producer/Const' type=Const, tf.Operation 'input/input_producer/Size' type=Const, tf.Operation 'input/input_producer/Greater/y' type=Const, tf.Operation 'input/input_producer/Greater' type=Greater, tf.Operation 'input/input_producer/Assert/Const' type=Const, tf.Operation 'input/input_producer/Assert/Assert/data_0' type=Const, tf.Operation 'input/input_producer/Assert/Assert' type=Assert, tf.Operation 'input/input_producer/Identity' type=Identity, tf.Operation 'input/input_producer/RandomShuffle' type=RandomShuffle, tf.Operation 'input/input_producer' type=FIFOQueueV2, tf.Operation 'input/input_producer/input_producer_EnqueueMany' type=QueueEnqueueManyV2, tf.Operation 'input/input_producer/input_producer_Close' type=QueueCloseV2, tf.Operation 'input/input_producer/input_producer_Close_1' type=QueueCloseV2, tf.Operation 'input/input_producer/input_producer_Size' type=QueueSizeV2, tf.Operation 'input/input_producer/Cast' type=Cast, tf.Operation 'input/input_producer/mul/y' type=Const, tf.Operation 'input/input_producer/mul' type=Mul, tf.Operation 'input/input_producer/fraction_of_32_full/tags' type=Const, tf.Operation 'input/input_producer/fraction_of_32_full' type=ScalarSummary, tf.Operation 'input/TFRecordReaderV2' type=TFRecordReaderV2, tf.Operation 'input/ReaderReadV2' type=ReaderReadV2,

------END INPUT OPS-----

----UPDATE 3----

I believe what I need to do is to kill the input section of the graph trained with TF Records and rewire the input to the first layer to a new input. Its kinda like performing surgery; but this is the only way I can find to do inference if I trained using TFRecords as crazy as it sounds...

Full Graph:

enter image description here

Section to kill:

enter image description here

So I think the question becomes: How does one kill the input section of the graph and replace it with a feed_dict?

A follow up to this would be: is this really the right way to do it? This seems bonkers.

----END UPDATE 3----

---link to checkpoint files---

https://drcdata.blob.core.windows.net/checkpoints/CIFAR_10_VGG3_50neuron_1pool_1e-3lr_adam.model.zip?st=2017-05-01T21%3A56%3A00Z&se=2020-05-02T21%3A56%3A00Z&sp=rl&sv=2015-12-11&sr=b&sig=oBCGxlOusB4NOEKnSnD%2FTlRYa5NKNIwAX1IyuZXAr9o%3D

--end link to checkpoint files---

-----UPDATE 4 -----

I gave in and just gave a shot at the 'normal' way of performing inference assuming I could have the scientists simply just pickle their models and we could grab the model pickle; unpack it and then run inference on it. So to test I tried the normal way assuming we already unpacked it...It doesn't work worth a beans either...

Click to copy

import tensorflow as tf import CONSTANTS import Vgg3CIFAR10 import numpy as np from scipy import misc import time  MODEL_PATH = 'models/' + CONSTANTS.MODEL_NAME + '.model' imgs_bsdir = 'C:/data/cifar_10/train/'  images = tf.placeholder(tf.float32, shape=(1, 32, 32, 3))  logits = Vgg3CIFAR10.inference(images)  def run_inference(): '''Runs inference against a loaded model'''     with tf.Session() as sess:         sess.run(tf.global_variables_initializer())         new_saver = tf.train.import_meta_graph(MODEL_PATH + '.meta')#, import_scope='1', input_map={'input:0': images})         new_saver.restore(sess, MODEL_PATH)         pred = tf.get_default_graph().get_operation_by_name('prediction')         enq = sess.graph.get_operation_by_name(enqueue_op)         #tf.train.start_queue_runners(sess)         print(rand)         print(pred)         print(enq)         for i in range(1, 25):             img = misc.imread(imgs_bsdir + str(i) + '.png').astype(np.float32) / 255.0             img = img.reshape(1, 32, 32, 3)             print(sess.run(logits, feed_dict={images : img}))             time.sleep(3)         print('done')  run_inference()

Tensorflow ends up building a new graph with the inference function from the loaded model; then it appends all the other stuff from the other graph to the end of it. So then when I populate a feed_dict expecting to get inferences back; I just get a bunch of random garbage as if it were the first pass through the network...

Again; this seems nuts; do I really need to write my own framework for serializing and deserializing random networks? This has had to have been done before...

-----UPDATE 4 -----

Again; thanks!

952

asked Apr 30 '17 16:04

David Crook

1 Answers

Alright, this took way too much time to figure out; so here is the answer for the rest of the world.

Quick Reminder: I needed to persist a model that can be dynamically loaded and inferred against without knowledge as to the under pinnings or insides of how it works.

Step 1: Create a model as a Class and ideally use an interface definition

Click to copy

class Vgg3Model:      NUM_DENSE_NEURONS = 50     DENSE_RESHAPE = 32 * (CONSTANTS.IMAGE_SHAPE[0] // 2) * (CONSTANTS.IMAGE_SHAPE[1] // 2)      def inference(self, images):         '''         Portion of the compute graph that takes an input and converts it into a Y output         '''         with tf.variable_scope('Conv1') as scope:             C_1_1 = ld.cnn_layer(images, (5, 5, 3, 32), (1, 1, 1, 1), scope, name_postfix='1')             C_1_2 = ld.cnn_layer(C_1_1, (5, 5, 32, 32), (1, 1, 1, 1), scope, name_postfix='2')             P_1 = ld.pool_layer(C_1_2, (1, 2, 2, 1), (1, 2, 2, 1), scope)         with tf.variable_scope('Dense1') as scope:             P_1 = tf.reshape(P_1, (-1, self.DENSE_RESHAPE))             dim = P_1.get_shape()[1].value             D_1 = ld.mlp_layer(P_1, dim, self.NUM_DENSE_NEURONS, scope, act_func=tf.nn.relu)         with tf.variable_scope('Dense2') as scope:             D_2 = ld.mlp_layer(D_1, self.NUM_DENSE_NEURONS, CONSTANTS.NUM_CLASSES, scope)         H = tf.nn.softmax(D_2, name='prediction')         return H      def loss(self, logits, labels):         '''         Adds Loss to all variables         '''         cross_entr = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels)         cross_entr = tf.reduce_mean(cross_entr)         tf.summary.scalar('cost', cross_entr)         tf.add_to_collection('losses', cross_entr)         return tf.add_n(tf.get_collection('losses'), name='total_loss')

Step 2: Train your network with whatever inputs you want; in my case I used Queue Runners and TF Records. Note that this step is done by a different team which iterates, builds, designs and optimizes models. This can also change over time. The output they produce must be able to be pulled from a remote location so we can dynamically load the updated models on devices (reflashing hardware is a pain especially if it is geographically distributed). In this instance; the team drops the 3 files associated with a graph saver; but also a pickle of the model used for that training session

Click to copy

model = vgg3.Vgg3Model()  def create_sess_ops():     '''     Creates and returns operations needed for running     a tensorflow training session     '''     GRAPH = tf.Graph()     with GRAPH.as_default():         examples, labels = Inputs.read_inputs(CONSTANTS.RecordPaths,                                           batch_size=CONSTANTS.BATCH_SIZE,                                           img_shape=CONSTANTS.IMAGE_SHAPE,                                           num_threads=CONSTANTS.INPUT_PIPELINE_THREADS)         examples = tf.reshape(examples, [-1, CONSTANTS.IMAGE_SHAPE[0],                                      CONSTANTS.IMAGE_SHAPE[1], CONSTANTS.IMAGE_SHAPE[2]], name='infer/input')         logits = model.inference(examples)         loss = model.loss(logits, labels)         OPTIMIZER = tf.train.AdamOptimizer(CONSTANTS.LEARNING_RATE)         gradients = OPTIMIZER.compute_gradients(loss)         apply_gradient_op = OPTIMIZER.apply_gradients(gradients)         gradients_summary(gradients)         summaries_op = tf.summary.merge_all()         return [apply_gradient_op, summaries_op, loss, logits], GRAPH  def main():     '''     Run and Train CIFAR 10     '''     print('starting...')     ops, GRAPH = create_sess_ops()     total_duration = 0.0     with tf.Session(graph=GRAPH) as SESSION:         COORDINATOR = tf.train.Coordinator()         THREADS = tf.train.start_queue_runners(SESSION, COORDINATOR)         SESSION.run(tf.global_variables_initializer())         SUMMARY_WRITER = tf.summary.FileWriter('Tensorboard/' + CONSTANTS.MODEL_NAME, graph=GRAPH)         GRAPH_SAVER = tf.train.Saver()          for EPOCH in range(CONSTANTS.EPOCHS):             duration = 0             error = 0.0             start_time = time.time()             for batch in range(CONSTANTS.MINI_BATCHES):                 _, summaries, cost_val, prediction = SESSION.run(ops)                 error += cost_val             duration += time.time() - start_time             total_duration += duration             SUMMARY_WRITER.add_summary(summaries, EPOCH)             print('Epoch %d: loss = %.2f (%.3f sec)' % (EPOCH, error, duration))             if EPOCH == CONSTANTS.EPOCHS - 1 or error < 0.005:                 print(                 'Done training for %d epochs. (%.3f sec)' % (EPOCH, total_duration)             )                 break         GRAPH_SAVER.save(SESSION, 'models/' + CONSTANTS.MODEL_NAME + '.model')         with open('models/' + CONSTANTS.MODEL_NAME + '.pkl', 'wb') as output:             pickle.dump(model, output)         COORDINATOR.request_stop()         COORDINATOR.join(THREADS)

Step 3: Run some Inference. Load your pickled model; create a new graph by piping in the new placeholder to the logits; and then call session restore. DO NOT RESTORE THE WHOLE GRAPH; JUST THE VARIABLES.

Click to copy

MODEL_PATH = 'models/' + CONSTANTS.MODEL_NAME + '.model' imgs_bsdir = 'C:/data/cifar_10/train/'  images = tf.placeholder(tf.float32, shape=(1, 32, 32, 3)) with open('models/vgg3.pkl', 'rb') as model_in: model = pickle.load(model_in) logits = model.inference(images)  def run_inference():     '''Runs inference against a loaded model'''     with tf.Session() as sess:         sess.run(tf.global_variables_initializer())         new_saver = tf.train.Saver()         new_saver.restore(sess, MODEL_PATH)         print("Starting...")         for i in range(20, 30):             print(str(i) + '.png')             img = misc.imread(imgs_bsdir + str(i) + '.png').astype(np.float32) / 255.0             img = img.reshape(1, 32, 32, 3)             pred = sess.run(logits, feed_dict={images : img})             max_node = np.argmax(pred)             print('predicted label: ' + str(max_node))         print('done')  run_inference()

There definitely ways to improve on this using interfaces and maybe packaging up everything better; but this is working and sets the stage for how we will be moving forward.

FINAL NOTE When we finally pushed this to production, we ended up having to ship the stupid `mymodel_model.py file down with everything to build up the graph. So we now enforce a naming convention for all models and there is also a coding standard for production model runs so we can do this properly.

Good Luck!

157

answered Sep 18 '22 17:09

David Crook

Related questions
                            
                                Python DataFrame or list for storing objects
                            
                                How to force static typing in Python? [duplicate]
                            
                                Variadic templates and switch statement?
                            
                                Does Python have a function which computes multinomial coefficients?
                            
                                Why can comparing two seemingly equal pointers with == return false?
                            
                                Handling async request with React, Redux and Axios?
                            
                                Most vexing parse with array access
                            
                                docker run with --volume
                            
                                How to keep scroll position using flatlist when navigating back in react native ?
                            
                                Override a property for a single Spring Boot test
                            
                                docker image build vs docker build
                            
                                How to shift back the offset of a topic within a stable Kafka consumer group?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

TensorFlow Inference

Tags:

David Crook

People also ask

1 Answers

David Crook

Recent Activity

Donate For Us