Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow object detection API RCNN is slow on CPU: 1 frame per min

I am using a locally trained model from tensorflow object detection API. I am using the faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017 checkpoint. I retrained a 1 class model and exported it to SavedModel

python object_detection/export_inference_graph.py \
    --input_type image_tensor \
    --pipeline_config_path ${PIPELINE_CONFIG_PATH} \
    --trained_checkpoint_prefix /Users/Ben/Dropbox/GoogleCloud/Detection/train/model.ckpt-186\
    --output_directory /Users/Ben/Dropbox/GoogleCloud/Detection/SavedModel/

While I understand that there are other shallower models, the reported run times for RCNN are more than 100x faster than what i'm seeing. Can anyone chime in with their faster RCNN run time on CPU to corroborate? I'm trying to tell if its a problem with my code, or just move to a smaller model.

I am grabbing the code form the juypter notebook with very few changes. I am running in a clean virtualenv, with nothing but the requirements installed.

detection_predict.py

import numpy as np
import tensorflow as tf
from PIL import Image
import glob
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util
import os
import datetime

TEST_IMAGE_PATHS = glob.glob("/Users/Ben/Dropbox/GoogleCloud/Detection/images/validation/*.jpg")

# Size, in inches, of the output images. ?
IMAGE_SIZE = (12, 8)
NUM_CLASSES = 1

sess=tf.Session()
tf.saved_model.loader.load(sess,[tf.saved_model.tag_constants.SERVING], "/Users/ben/Dropbox/GoogleCloud/Detection/SavedModel/saved_model/")    

label_map = label_map_util.load_labelmap("label.pbtxt")
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)

def load_image_into_numpy_array(image):
    (im_width, im_height) = image.size
    npdata=np.array(image.getdata()).reshape((im_height, im_width, 3)).astype(np.uint8)   
    return npdata

# Definite input and output Tensors for sess.graph
image_tensor = sess.graph.get_tensor_by_name('image_tensor:0')

# Each box represents a part of the image where a particular object was detected.
detection_boxes = sess.graph.get_tensor_by_name('detection_boxes:0')

# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
detection_scores = sess.graph.get_tensor_by_name('detection_scores:0')
detection_classes = sess.graph.get_tensor_by_name('detection_classes:0')
num_detections = sess.graph.get_tensor_by_name('num_detections:0')
for image_path in TEST_IMAGE_PATHS:

    image = Image.open(image_path)

    #basewidth = 300
    #wpercent = (basewidth/float(image.size[0]))
    #hsize = int((float(image.size[1])*float(wpercent)))
    #image = image.resize((basewidth,hsize), Image.ANTIALIAS)

    # the array based representation of the image will be used later in order to prepare the
    # result image with boxes and labels on it.
    image_np = load_image_into_numpy_array(image)

    # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
    image_np_expanded = np.expand_dims(image_np, axis=0)
    # Actual detection.
    before = datetime.datetime.now()    
    (boxes, scores, classes, num) = sess.run([detection_boxes, detection_scores, detection_classes, num_detections],feed_dict={image_tensor: image_np_expanded})
    print("Prediction took : " + str(datetime.datetime.now() - before))  

    # Visualization of the results of a detection.
    vis_util.visualize_boxes_and_labels_on_image_array(image_np, np.squeeze(boxes), np.squeeze(classes).astype(np.int32), np.squeeze(scores), category_index, use_normalized_coordinates=True,line_thickness=8)
    plt.figure(figsize=IMAGE_SIZE)
    fn=os.path.basename(image_path)
    plt.imsave("/Users/Ben/Dropbox/GoogleCloud/Detection/validation/" + fn,image_np)

yields

(detection) Bens-MacBook-Pro:Detection ben$ python detection_predict.py 

Prediction took : 0:00:51.475269
Prediction took : 0:00:43.955962

Resizing the image does not make any difference (commented out above). They aren't enormous (1280 X 720).

Is this expected?

System information

enter image description here

Latest Tensorflow version

Bens-MacBook-Pro:Detection ben$ python
Python 2.7.10 (default, Feb  7 2017, 00:08:15) 
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.__version__
'1.3.0'

EDIT #1

In case anyone is wondering, predicting from the frozen inference graph makes no difference.

detection_graph = tf.Graph()
with detection_graph.as_default():
    od_graph_def = tf.GraphDef()
    with tf.gfile.GFile("/Users/ben/Dropbox/GoogleCloud/Detection/SavedModel/frozen_inference_graph.pb", 'rb') as fid:
        serialized_graph = fid.read()
        od_graph_def.ParseFromString(serialized_graph)
        tf.import_graph_def(od_graph_def, name='')

(detection) Bens-MacBook-Pro:Detection ben$ python detection_predict.py 

Prediction took : 0:01:02.651046
Prediction took : 0:00:43.820992
Prediction took : 0:00:48.805432

cProfile isn't particularly illuminating

>>> stats.print_stats(20)
Thu Oct 19 14:55:47 2017    profiling_results

         40742812 function calls (38600273 primitive calls) in 173.800 seconds

   Ordered by: internal time
   List reduced from 4918 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        3  138.345   46.115  138.345   46.115 {_pywrap_tensorflow_internal.TF_Run}
977635/702731    2.852    0.000    9.200    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:469(init)
        3    2.597    0.866    2.597    0.866 {matplotlib._png.write_png}
    10719    2.111    0.000    2.114    0.000 {numpy.core.multiarray.array}
   363351    1.378    0.000    3.216    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:424(MakeSubMessageDefault)
  1045442    1.342    0.000    1.342    0.000 {_weakref.proxy}
562666/310637    1.317    0.000    6.182    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:1211(MergeFrom)
   931022    1.268    0.000    3.113    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:777(ListFields)
789671/269414    1.122    0.000    9.116    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:1008(ByteSize)
  1045442    0.882    0.000    2.498    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:1375(__init__)
3086143/3086140    0.662    0.000    0.756    0.000 {isinstance}
  1427511    0.656    0.000    0.782    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:762(_IsPresent)
   931092    0.649    0.000    0.879    0.000 {method 'sort' of 'list' objects}
1189105/899500    0.599    0.000    0.942    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:1330(Modified)
        1    0.537    0.537    0.537    0.537 {_pywrap_tensorflow_internal.TF_ExtendGraph}
276877/45671    0.480    0.000    8.315    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:1050(InternalSerialize)
  2602117    0.480    0.000    0.480    0.000 {method 'items' of 'dict' objects}
   459805    0.474    0.000    1.336    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/containers.py:551(__getitem__)
        1    0.434    0.434   16.605   16.605 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/tensorflow/python/framework/importer.py:156(import_graph_def)
  1297794    0.367    0.000    0.367    0.000 {method 'write' of '_io.BytesIO' objects}

Edit #2

After pushing hard on this, I'm starting to suspect that those people who reported faster times were not being rigorous in documenting their environment. Some GPU checkpoints are here for those interested.

https://github.com/tensorflow/models/issues/1715

I'm leaving the question open in hopes that someone will report their CPU time for the largest model, but I am proceeding with the idea that this is correct for the moment and moving to the shallower models. Perhaps this will be helpful to others in deciding which model to choose.

like image 471
bw4sz Avatar asked Sep 02 '25 16:09

bw4sz


1 Answers

Hopefully this will help other users choose models. Here is my reported average times for 3.1 Ghz CPU processor on OSX (more info above).

faster_rcnn_inception_resnet_v2_atrous_coco: 45 sec/image

faster_rcnn_resnet101_coco: 16 sec/image

fcn_resnet101_coco: 7 sec/image

ssd_inception_v2_coco: 0.3 sec/image

ssd_mobilenet_v1_coco: 0.3 sec/image

like image 74
bw4sz Avatar answered Sep 04 '25 18:09

bw4sz