I've been digging around on this for a while. I have found a ton of articles; but none really show just tensorflow inference as a plain inference. Its always "use the serving engine" or using a graph that is pre-coded/defined.
Here is the problem: I have a device which occasionally checks for updated models. It then needs to load that model and run input predictions through the model.
In keras this was simple: build a model; train the model and the call model.predict(). In scikit-learn same thing.
I am able to grab a new model and load it; I can print out all of the weights; but how in the world do I run inference against it?
Code to load model and print weights:
with tf.Session() as sess: new_saver = tf.train.import_meta_graph(MODEL_PATH + '.meta', clear_devices=True) new_saver.restore(sess, MODEL_PATH) for var in tf.trainable_variables(): print(sess.run(var))
I printed out all of my collections and I have: ['queue_runners', 'variables', 'losses', 'summaries', 'train_op', 'cond_context', 'trainable_variables']
I tried using sess.run(train_op)
; however that just started kicking up a full training session; which is not what I want to do. I just want to run inference against a different set of inputs that I provide which are not TF Records.
Just a little more detail:
The device can use C++ or Python; as long as I can produce a .exe. I can set up a feed dict if I want to feed the system. I trained with TFRecords
; but in production I'm not going to use TFRecords
; its a real/near real time system.
Thanks for any input. I am posting sample code to this repo: https://github.com/drcrook1/CIFAR10/TensorFlow which does all the training and sample inference.
Any hints are greatly appreciated!
------------EDITS----------------- I rebuilt the model to be as below:
def inference(images): ''' Portion of the compute graph that takes an input and converts it into a Y output ''' with tf.variable_scope('Conv1') as scope: C_1_1 = ld.cnn_layer(images, (5, 5, 3, 32), (1, 1, 1, 1), scope, name_postfix='1') C_1_2 = ld.cnn_layer(C_1_1, (5, 5, 32, 32), (1, 1, 1, 1), scope, name_postfix='2') P_1 = ld.pool_layer(C_1_2, (1, 2, 2, 1), (1, 2, 2, 1), scope) with tf.variable_scope('Dense1') as scope: P_1 = tf.reshape(C_1_2, (CONSTANTS.BATCH_SIZE, -1)) dim = P_1.get_shape()[1].value D_1 = ld.mlp_layer(P_1, dim, NUM_DENSE_NEURONS, scope, act_func=tf.nn.relu) with tf.variable_scope('Dense2') as scope: D_2 = ld.mlp_layer(D_1, NUM_DENSE_NEURONS, CONSTANTS.NUM_CLASSES, scope) H = tf.nn.softmax(D_2, name='prediction') return H
notice I add the name 'prediction'
to the TF operation so I can retrieve it later.
When training I used the input pipeline for tfrecords
and input queues.
GRAPH = tf.Graph() with GRAPH.as_default(): examples, labels = Inputs.read_inputs(CONSTANTS.RecordPaths, batch_size=CONSTANTS.BATCH_SIZE, img_shape=CONSTANTS.IMAGE_SHAPE, num_threads=CONSTANTS.INPUT_PIPELINE_THREADS) examples = tf.reshape(examples, [CONSTANTS.BATCH_SIZE, CONSTANTS.IMAGE_SHAPE[0], CONSTANTS.IMAGE_SHAPE[1], CONSTANTS.IMAGE_SHAPE[2]]) logits = Vgg3CIFAR10.inference(examples) loss = Vgg3CIFAR10.loss(logits, labels) OPTIMIZER = tf.train.AdamOptimizer(CONSTANTS.LEARNING_RATE)
I am attempting to use feed_dict
on the loaded operation in the graph; however now it is just simply hanging....
MODEL_PATH = 'models/' + CONSTANTS.MODEL_NAME + '.model' images = tf.placeholder(tf.float32, shape=(1, 32, 32, 3)) def run_inference(): '''Runs inference against a loaded model''' with tf.Session() as sess: #sess.run(tf.global_variables_initializer()) new_saver = tf.train.import_meta_graph(MODEL_PATH + '.meta', clear_devices=True) new_saver.restore(sess, MODEL_PATH) pred = tf.get_default_graph().get_operation_by_name('prediction') rand = np.random.rand(1, 32, 32, 3) print(rand) print(pred) print(sess.run(pred, feed_dict={images: rand})) print('done') run_inference()
I believe this is not working because the original network was trained using TFRecords. In the sample CIFAR data set the data is small; our real data set is huge and it is my understanding TFRecords the the default best practice for training a network. The feed_dict
makes great perfect sense from a productionizing perspective; we can spin up some threads and populate that thing from our input systems.
So I guess I have a network that is trained, I can get the predict operation; but how do I tell it to stop using the input queues and start using the feed_dict
? Remember that from the production perspective I do not have access to whatever the scientists did to make it. They do their thing; and we stick it in production using whatever agreed upon standard.
-------INPUT OPS--------
tf.Operation 'input/input_producer/Const' type=Const, tf.Operation 'input/input_producer/Size' type=Const, tf.Operation 'input/input_producer/Greater/y' type=Const, tf.Operation 'input/input_producer/Greater' type=Greater, tf.Operation 'input/input_producer/Assert/Const' type=Const, tf.Operation 'input/input_producer/Assert/Assert/data_0' type=Const, tf.Operation 'input/input_producer/Assert/Assert' type=Assert, tf.Operation 'input/input_producer/Identity' type=Identity, tf.Operation 'input/input_producer/RandomShuffle' type=RandomShuffle, tf.Operation 'input/input_producer' type=FIFOQueueV2, tf.Operation 'input/input_producer/input_producer_EnqueueMany' type=QueueEnqueueManyV2, tf.Operation 'input/input_producer/input_producer_Close' type=QueueCloseV2, tf.Operation 'input/input_producer/input_producer_Close_1' type=QueueCloseV2, tf.Operation 'input/input_producer/input_producer_Size' type=QueueSizeV2, tf.Operation 'input/input_producer/Cast' type=Cast, tf.Operation 'input/input_producer/mul/y' type=Const, tf.Operation 'input/input_producer/mul' type=Mul, tf.Operation 'input/input_producer/fraction_of_32_full/tags' type=Const, tf.Operation 'input/input_producer/fraction_of_32_full' type=ScalarSummary, tf.Operation 'input/TFRecordReaderV2' type=TFRecordReaderV2, tf.Operation 'input/ReaderReadV2' type=ReaderReadV2,
------END INPUT OPS-----
----UPDATE 3----
I believe what I need to do is to kill the input section of the graph trained with TF Records and rewire the input to the first layer to a new input. Its kinda like performing surgery; but this is the only way I can find to do inference if I trained using TFRecords as crazy as it sounds...
Full Graph:
Section to kill:
So I think the question becomes: How does one kill the input section of the graph and replace it with a feed_dict
?
A follow up to this would be: is this really the right way to do it? This seems bonkers.
----END UPDATE 3----
---link to checkpoint files---
https://drcdata.blob.core.windows.net/checkpoints/CIFAR_10_VGG3_50neuron_1pool_1e-3lr_adam.model.zip?st=2017-05-01T21%3A56%3A00Z&se=2020-05-02T21%3A56%3A00Z&sp=rl&sv=2015-12-11&sr=b&sig=oBCGxlOusB4NOEKnSnD%2FTlRYa5NKNIwAX1IyuZXAr9o%3D
--end link to checkpoint files---
-----UPDATE 4 -----
I gave in and just gave a shot at the 'normal' way of performing inference assuming I could have the scientists simply just pickle their models and we could grab the model pickle; unpack it and then run inference on it. So to test I tried the normal way assuming we already unpacked it...It doesn't work worth a beans either...
import tensorflow as tf import CONSTANTS import Vgg3CIFAR10 import numpy as np from scipy import misc import time MODEL_PATH = 'models/' + CONSTANTS.MODEL_NAME + '.model' imgs_bsdir = 'C:/data/cifar_10/train/' images = tf.placeholder(tf.float32, shape=(1, 32, 32, 3)) logits = Vgg3CIFAR10.inference(images) def run_inference(): '''Runs inference against a loaded model''' with tf.Session() as sess: sess.run(tf.global_variables_initializer()) new_saver = tf.train.import_meta_graph(MODEL_PATH + '.meta')#, import_scope='1', input_map={'input:0': images}) new_saver.restore(sess, MODEL_PATH) pred = tf.get_default_graph().get_operation_by_name('prediction') enq = sess.graph.get_operation_by_name(enqueue_op) #tf.train.start_queue_runners(sess) print(rand) print(pred) print(enq) for i in range(1, 25): img = misc.imread(imgs_bsdir + str(i) + '.png').astype(np.float32) / 255.0 img = img.reshape(1, 32, 32, 3) print(sess.run(logits, feed_dict={images : img})) time.sleep(3) print('done') run_inference()
Tensorflow ends up building a new graph with the inference function from the loaded model; then it appends all the other stuff from the other graph to the end of it. So then when I populate a feed_dict
expecting to get inferences back; I just get a bunch of random garbage as if it were the first pass through the network...
Again; this seems nuts; do I really need to write my own framework for serializing and deserializing random networks? This has had to have been done before...
-----UPDATE 4 -----
Again; thanks!
The term inference refers to the process of executing a TensorFlow Lite model on-device in order to make predictions based on input data. To perform an inference with a TensorFlow Lite model, you must run it through an interpreter. The TensorFlow Lite interpreter is designed to be lean and fast.
TensorFlow-TensorRT (TF-TRT) is an integration of TensorFlow and TensorRT that leverages inference optimization on NVIDIA GPUs within the TensorFlow ecosystem. It provides a simple API that delivers substantial performance gains on NVIDIA GPUs with minimal effort.
Just pull on node y and you'll have what you want. This applies to just about any model you create - you'll have computed the prediction probabilities as one of the last steps before computing the loss.
Alright, this took way too much time to figure out; so here is the answer for the rest of the world.
Quick Reminder: I needed to persist a model that can be dynamically loaded and inferred against without knowledge as to the under pinnings or insides of how it works.
Step 1: Create a model as a Class and ideally use an interface definition
class Vgg3Model: NUM_DENSE_NEURONS = 50 DENSE_RESHAPE = 32 * (CONSTANTS.IMAGE_SHAPE[0] // 2) * (CONSTANTS.IMAGE_SHAPE[1] // 2) def inference(self, images): ''' Portion of the compute graph that takes an input and converts it into a Y output ''' with tf.variable_scope('Conv1') as scope: C_1_1 = ld.cnn_layer(images, (5, 5, 3, 32), (1, 1, 1, 1), scope, name_postfix='1') C_1_2 = ld.cnn_layer(C_1_1, (5, 5, 32, 32), (1, 1, 1, 1), scope, name_postfix='2') P_1 = ld.pool_layer(C_1_2, (1, 2, 2, 1), (1, 2, 2, 1), scope) with tf.variable_scope('Dense1') as scope: P_1 = tf.reshape(P_1, (-1, self.DENSE_RESHAPE)) dim = P_1.get_shape()[1].value D_1 = ld.mlp_layer(P_1, dim, self.NUM_DENSE_NEURONS, scope, act_func=tf.nn.relu) with tf.variable_scope('Dense2') as scope: D_2 = ld.mlp_layer(D_1, self.NUM_DENSE_NEURONS, CONSTANTS.NUM_CLASSES, scope) H = tf.nn.softmax(D_2, name='prediction') return H def loss(self, logits, labels): ''' Adds Loss to all variables ''' cross_entr = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels) cross_entr = tf.reduce_mean(cross_entr) tf.summary.scalar('cost', cross_entr) tf.add_to_collection('losses', cross_entr) return tf.add_n(tf.get_collection('losses'), name='total_loss')
Step 2: Train your network with whatever inputs you want; in my case I used Queue Runners and TF Records. Note that this step is done by a different team which iterates, builds, designs and optimizes models. This can also change over time. The output they produce must be able to be pulled from a remote location so we can dynamically load the updated models on devices (reflashing hardware is a pain especially if it is geographically distributed). In this instance; the team drops the 3 files associated with a graph saver; but also a pickle of the model used for that training session
model = vgg3.Vgg3Model() def create_sess_ops(): ''' Creates and returns operations needed for running a tensorflow training session ''' GRAPH = tf.Graph() with GRAPH.as_default(): examples, labels = Inputs.read_inputs(CONSTANTS.RecordPaths, batch_size=CONSTANTS.BATCH_SIZE, img_shape=CONSTANTS.IMAGE_SHAPE, num_threads=CONSTANTS.INPUT_PIPELINE_THREADS) examples = tf.reshape(examples, [-1, CONSTANTS.IMAGE_SHAPE[0], CONSTANTS.IMAGE_SHAPE[1], CONSTANTS.IMAGE_SHAPE[2]], name='infer/input') logits = model.inference(examples) loss = model.loss(logits, labels) OPTIMIZER = tf.train.AdamOptimizer(CONSTANTS.LEARNING_RATE) gradients = OPTIMIZER.compute_gradients(loss) apply_gradient_op = OPTIMIZER.apply_gradients(gradients) gradients_summary(gradients) summaries_op = tf.summary.merge_all() return [apply_gradient_op, summaries_op, loss, logits], GRAPH def main(): ''' Run and Train CIFAR 10 ''' print('starting...') ops, GRAPH = create_sess_ops() total_duration = 0.0 with tf.Session(graph=GRAPH) as SESSION: COORDINATOR = tf.train.Coordinator() THREADS = tf.train.start_queue_runners(SESSION, COORDINATOR) SESSION.run(tf.global_variables_initializer()) SUMMARY_WRITER = tf.summary.FileWriter('Tensorboard/' + CONSTANTS.MODEL_NAME, graph=GRAPH) GRAPH_SAVER = tf.train.Saver() for EPOCH in range(CONSTANTS.EPOCHS): duration = 0 error = 0.0 start_time = time.time() for batch in range(CONSTANTS.MINI_BATCHES): _, summaries, cost_val, prediction = SESSION.run(ops) error += cost_val duration += time.time() - start_time total_duration += duration SUMMARY_WRITER.add_summary(summaries, EPOCH) print('Epoch %d: loss = %.2f (%.3f sec)' % (EPOCH, error, duration)) if EPOCH == CONSTANTS.EPOCHS - 1 or error < 0.005: print( 'Done training for %d epochs. (%.3f sec)' % (EPOCH, total_duration) ) break GRAPH_SAVER.save(SESSION, 'models/' + CONSTANTS.MODEL_NAME + '.model') with open('models/' + CONSTANTS.MODEL_NAME + '.pkl', 'wb') as output: pickle.dump(model, output) COORDINATOR.request_stop() COORDINATOR.join(THREADS)
Step 3: Run some Inference. Load your pickled model; create a new graph by piping in the new placeholder to the logits; and then call session restore. DO NOT RESTORE THE WHOLE GRAPH; JUST THE VARIABLES.
MODEL_PATH = 'models/' + CONSTANTS.MODEL_NAME + '.model' imgs_bsdir = 'C:/data/cifar_10/train/' images = tf.placeholder(tf.float32, shape=(1, 32, 32, 3)) with open('models/vgg3.pkl', 'rb') as model_in: model = pickle.load(model_in) logits = model.inference(images) def run_inference(): '''Runs inference against a loaded model''' with tf.Session() as sess: sess.run(tf.global_variables_initializer()) new_saver = tf.train.Saver() new_saver.restore(sess, MODEL_PATH) print("Starting...") for i in range(20, 30): print(str(i) + '.png') img = misc.imread(imgs_bsdir + str(i) + '.png').astype(np.float32) / 255.0 img = img.reshape(1, 32, 32, 3) pred = sess.run(logits, feed_dict={images : img}) max_node = np.argmax(pred) print('predicted label: ' + str(max_node)) print('done') run_inference()
There definitely ways to improve on this using interfaces and maybe packaging up everything better; but this is working and sets the stage for how we will be moving forward.
FINAL NOTE When we finally pushed this to production, we ended up having to ship the stupid `mymodel_model.py file down with everything to build up the graph. So we now enforce a naming convention for all models and there is also a coding standard for production model runs so we can do this properly.
Good Luck!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With