Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dlib webcam capture with face detection and shape prediction is slow

I am working on a program in C++ which should detect faces from webcam stream, than crop them using face landmarks and swap them.

I programmed face detection using OpenCV and Viola-Jones face detection. Works fine. Than I searched for how to segment just face from ROI. I tried few skin detection implementations but none was successful.

Than I found dlib face landmarks. I decided to try it. Just in beginning I faced problems because I had to convert cv::Mat to cv_image, Rect to rectangle etc. So I tried to do it just with dlib. I just get stream using cv::VideoCapture and than I wanted to show what is captured using dlib image_window. But here was the problem it was reeeealy slow. Down is used code. Commented lines are lines which do that same but using OpenCV. OpenCV is much more faster, smooth, continuous than code which is not commented whis is like 5 FPS. That's horrible. I can't imagine how slow it will be when I apply face detection and face landmarks.

Am I doing something wrong? How can I make it faster? Or should I use OpenCV for video capture and showing?

cv::VideoCapture cap;
image_window output_frame;

if (!cap.open(0))
{
    cout << "ERROR: Opening video device 0 FAILED." << endl;
    return -1;
}

cv::Mat cap_frame;
//HWND hwnd;
do
{
    cap >> cap_frame;

    if (!cap_frame.empty())
    {
        cv_image<bgr_pixel> dlib_frame(cap_frame);
        output_frame.set_image(dlib_frame);
        //cv::imshow("output",dlib::toMat(dlib_frame));
    }

    //if (27 == char(cv::waitKey(10)))
    //{
    //  return 0;
    //}

    //hwnd = FindWindowA(NULL, "output");
} while(!output_frame.is_closed())//while (hwnd != NULL);

EDIT: After switching to Release mode showing capured frames becomes fine. But I go on and tried to do face detection and shape prediction with dlib just like in example here http://dlib.net/face_landmark_detection_ex.cpp.html. It was quite laggy. So I turned off shape prediction. Still "laggy.

So I assumed face detection is slowing it down. So I tried face detection using OpenCV because it was significantly better than dlib detector. I needed to convert detected cv::Rect to dlib::rectangle. I used this.

std::vector<dlib::rectangle> dlib_rois;
long l, t, r, b;

for (int i = cv_rois.size() - 1; i >= 0; i--)
{
    l = cv_rois[i].x;
    t = cv_rois[i].y;
    r = cv_rois[i].x + cv_rois[i].width;
    b = cv_rois[i].y + cv_rois[i].height;
    dlib_rois.push_back(dlib::rectangle(l, t, r, b));
}

But this combination of OpenCV face detection and dlib shape prediction become brutal laggy. It takes about 4s to process single frame.

I can't figure out why. OpenCV face detection was absolutely fine, dlib shape prediction doesn't seem to be hard to process. Can somebody help me with?

like image 505
Gondil Avatar asked Mar 27 '16 10:03

Gondil


People also ask

Is dlib fast?

Dlib is incredibly fast and very lightweight.

Which is better dlib or Mtcnn?

MTCNN : Stands for Multi-task Cascaded Convolutional Networks. It is a python package that can be installed easily and they provide a high-level API for face detection. Its performance is better than Dlib but it needs more computation than Dlib.

Does dlib use CNN?

The Dlib CNN Face Detector In this post, we will move over from the traditional machine learning based face detector. We will use a CNN model for face detection. It is a pre-trained model that we will load while executing our Python scripts.


1 Answers

You can take several actions to make Dlib run faster, before assuming that it is slow. You only have to read more documentation and try.

  • Dlib is capable of detecting faces in very small areas (80x80 pixels). You are probably sending raw WebCam frames at approximately 1280x720 resolution, which is not necessary. I recommend from my experience to reduce the frames about a quarter of the original resolution. Yes, 320x180 is fine for Dlib. In consequence you will get 4x speed.

  • As mentioned in the comments, by turning on the compilation optimizations while building Dlib, you will get significantly improvement in speed.

  • Dlib works faster with grayscale images. You do not need the color on the webcam frame. You can use OpenCV to convert into grayscale the previously reduced in size frame.

  • Dlib takes its time finding faces but is extremely fast finding landmarks on faces. Only if your Webcam provides a high framerate (24-30fps), you could skip some frames because faces normally doesn't move so much.

Given that optimizations, I am confident you will get at least 12x faster detection.

like image 112
Ezequiel Adrian Avatar answered Oct 02 '22 00:10

Ezequiel Adrian