Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow Object Detection Slow when using rtsp stream

I have followed the example here: https://www.youtube.com/watch?v=MoMjIwGSFVQ and have the object detection working with a web cam.

But I have switched my web cam to use an rtsp stream from an IP camera which I believe is streaming H264 and I now notice that there is about a 30 second lag in the video, plus the video is very stop start at times.

Here is the python code that does the main processing:

import cv2
cap = cv2.VideoCapture("rtsp://192.168.200.1:5544/stream1")

# Running the tensorflow session
with detection_graph.as_default():
  with tf.Session(graph=detection_graph) as sess:
   ret = True
   while (ret):
      ret,image_np = cap.read()

      # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
      image_np_expanded = np.expand_dims(image_np, axis=0)
      image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')

      # Each box represents a part of the image where a particular object was detected.
      boxes = detection_graph.get_tensor_by_name('detection_boxes:0')

      # Each score represent how level of confidence for each of the objects.
      # Score is shown on the result image, together with the class label.
      scores = detection_graph.get_tensor_by_name('detection_scores:0')
      classes = detection_graph.get_tensor_by_name('detection_classes:0')
      num_detections = detection_graph.get_tensor_by_name('num_detections:0')

      # Actual detection.
      (boxes, scores, classes, num_detections) = sess.run(
      [boxes, scores, classes, num_detections],
          feed_dict={image_tensor: image_np_expanded})

      # Visualization of the results of a detection.
      vis_util.visualize_boxes_and_labels_on_image_array(
          image_np,
          np.squeeze(boxes),
          np.squeeze(classes).astype(np.int32),
          np.squeeze(scores),
          category_index,
          use_normalized_coordinates=True,
          line_thickness=8)

#      plt.figure(figsize=IMAGE_SIZE)
#      plt.imshow(image_np)
      cv2.imshow('image',cv2.resize(image_np,(1280,960)))
      if cv2.waitKey(25) & 0xFF == ord('q'):
          cv2.destroyAllWindows()
          cap.release()
          break

I am new to python and tensorflow. Should this code be modified in any way to cope with the rtsp stream? My PC does not have a GPU card.

like image 772
Harry Boy Avatar asked May 17 '18 08:05

Harry Boy


3 Answers

Opencv's read() function works differently for usb webcams vs ipcameras.

It doesn't read the latest frame, but the oldest (next) frame when run on ipcameras.

Since the object detection inference in the loop eats up time, read() quickly get's behind and is reading the oldest available frame in the opencv buffer.

A solution is to start a thread for the camera that reads frames and fills a queue. Then in another thread, read frames from this queue and run the object detection inference on them.

like image 144
PeterVennerstrom Avatar answered Nov 07 '22 09:11

PeterVennerstrom


Without GPU Tensorflow can't process high quality frames at great fps. It took almost 0.2 seconds for processing a 640*480 frame in my machine. So it can handle about 5 frames per second.

There are two ways to make the code run in real time.

  • Reduce resolution of frame
  • Reduce fps

Code

cap = cv2.VideoCapture("rtsp://192.168.200.1:5544/stream1")
cap.set(3,640) #set frame width
cap.set(4,480) #set frame height
cap.set(cv2.cv.CV_CAP_PROP_FPS, 5) #adjusting fps to 5

Note: Tensorflow object detection performs reasonably well even at low resolutions.

To experience GPU performance, floydhub provides free GPU service(limited hours). You can upload code and run in floydhub and measure the performance. I found GPU was about 35 times faster than CPU.

like image 43
Sreeragh A R Avatar answered Nov 07 '22 09:11

Sreeragh A R


If 1080p @ 30fps works fine from your webcam but not over RTSP, it's likely that the extra overhead of decoding the RTSP stream is asking too much of your CPU. It's having trouble doing both the tasks you're asking of it simultaneously. It's also possible that memory is the bottleneck, though that seems less likely.

Many Intel CPUs have integrated GPUs that are capable of decoding video natively. However, I've noticed that under certain conditions, and with certain software, native decoding select CPUs tends to lag considerably (as much as ~30 sec). That could also be the issue you're encountering here. It may be worth trying this software out on a friend's computer with similar-quality but not identical hardware. You can also test it out on newer hardware of the same price range, as I haven't seen this issue in the latest generation Intel CPUs.

like image 28
Zenexer Avatar answered Nov 07 '22 11:11

Zenexer