I've been trying to use tensorflow's object detection to try and set up a decent presence detection. I'm using tensorflow's pretrained model and a code example to perform object detection on a webcam. Is there any way to remove objects from the model or filter out objects from the person class? This is the code i currently have.
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
from utils import label_map_util
from utils import visualization_utils as vis_util
# # Model preparation
# Any model exported using the `export_inference_graph.py` tool can be loaded here simply by changing `PATH_TO_CKPT` to point to a new .pb file.
# By default we use an "SSD with Mobilenet" model here. See the [detection model zoo](https://github.com/tensorflow/models/blob/master/object_detection/g3doc/detection_model_zoo.md) for a list of other models that can be run out-of-the-box with varying speeds and accuracies.
# What model to download.
MODEL_NAME = 'ssd_mobilenet_v1_coco_11_06_2017'
MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'
# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')
NUM_CLASSES = 90
# ## Download Model
if not os.path.exists(MODEL_NAME + '/frozen_inference_graph.pb'):
print ('Downloading the model')
opener = urllib.request.URLopener()
opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
tar_file = tarfile.open(MODEL_FILE)
for file in tar_file.getmembers():
file_name = os.path.basename(file.name)
if 'frozen_inference_graph.pb' in file_name:
tar_file.extract(file, os.getcwd())
print ('Download complete')
else:
print ('Model already exists')
# ## Load a (frozen) Tensorflow model into memory.
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
# ## Loading label map
# Label maps map indices to category names, so that when our convolution network predicts `5`, we know that this corresponds to `airplane`. Here we use internal utility functions, but anything that returns a dictionary mapping integers to appropriate string labels would be fine
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
#intializing the web camera device
import cv2
cap = cv2.VideoCapture(0)
# Running the tensorflow session
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
ret = True
while (ret):
ret,image_np = cap.read()
image_np = cv2.resize(image_np,(600,400))
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
scores = detection_graph.get_tensor_by_name('detection_scores:0')
classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
b = [x for x in classes if x == 1]
# Actual detection.
(boxes, scores, classes, num_detections) = sess.run(
[boxes, scores, classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
np.squeeze(boxes),
np.squeeze(b).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8)
#print (len(boxes.shape))
#print (classes)
final_score = np.squeeze(scores)
count = 0
for i in range(100):
if scores is None or final_score[i] > 0.5:
count = count + 1
print (count, ' object(s) detected...')
# plt.figure(figsize=IMAGE_SIZE)
# plt.imshow(image_np)
cv2.imshow('image',image_np)
if cv2.waitKey(200) & 0xFF == ord('q'):
cv2.destroyAllWindows()
cap.release()
break
Object Detection using Tensorflow is a computer vision technique. As the name suggests, it helps us in detecting, locating, and tracing an object from an image or a video.
The pre-trained models we provide are trained to detect 90 classes of objects.
The TensorFlow Object Detection API is an open-source framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection models. There are already pre-trained models in their framework which are referred to as Model Zoo. It includes a collection of pre-trained models trained on various datasets such as the
T he model is trained using the Tensorflow Object Detection API for training the model for 20 classes on the Pascal VOC 2012 dataset. There has been a buzz all around, about Machine Learning and Deep Neural Networks since, their popularity has revived after they were shelved for quite a few years.
In case you need to enable GPU support, check the guidelines on NVIDIA’s website. Your goal is to install the latest version of both the CUDA Toolkit, and cuDNN for your operating system. Let’s first make sure that we have everything needed to start working with the TensorFlow Object Detection API.
By default, the TensorFlow Object Detection API uses Protobuf to configure model and training parameters, so we need this library to move on. Go to the official protoc release page and download an archive for the latest protobuf version compatible with your operation system and processor architecture. For example, I’m using Ubuntu.
I saw that you used a filter in the line b = [x for x in classes if x == 1]
to just get all the person detections. (In the label map, person's id is exactly 1). But it didn't work because you need to change boxes
, scores
and classes
accordingly. Try this :
Firstly remove the line
b = [x for x in classes if x == 1]
Then add the following after sess.run()
function
boxes = np.squeeze(boxes)
scores = np.squeeze(scores)
classes = np.squeeze(classes)
indices = np.argwhere(classes == 1)
boxes = np.squeeze(boxes[indices])
scores = np.squeeze(scores[indices])
classes = np.squeeze(classes[indices])
and then call the visualization function
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
boxes,
classes,
scores,
category_index,
use_normalized_coordinates=True,
line_thickness=8)
The idea is the model can produce detections of multiple classes but only class person is chosen to visualize on the image.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With