Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate face_descriptor faster

In my face recognition project a face is represented as a 128-dimensional embedding(face_descriptor) as used in FaceNet. I could generate embedding from image in 2 ways.

Using Tensorflow resnet model v1.

emb_array = sess.run(embedding_layer,
                    {images_placeholder: images_array, phase_train_placeholder: False})

An array of images can be passed and a list of embeddings is obtained. This is a bit slow.Took 1.6s.(Though the time is almost constant for large number of images). Note: GPU not available

Other method is using dlib

dlib.face_recognition_model_v1.compute_face_descriptor(image, shape)

This gives fast result. Almost 0.05 seconds. But only one image can be passed at a time.Time increases with number of images.

Is there any way to pass array of images to calculate embeddings in dlib or any way to improve the speed in dlib?

Or is there any other faster method to generate 128 dimensional face embedding?

Update: I concatenated multiple images into single image and passed to dlib

dlib.face_recognition_model_v1.compute_face_descriptor(big_image, shapes)

i.e converted multiple images with single face into single image with multiple faces. Still time is proportional to number of images(i.e number of faces) concatenated. Almost same time for iterating on individual images.

like image 653
Sreeragh A R Avatar asked Apr 04 '18 11:04

Sreeragh A R


1 Answers

One of the more important aspects to this question is that you have no GPU available. I'm putting this here so if anyone reads this answer will have a better understanding of the context.

There are two major parts to the time consumed for inference. First is the setup time. Tensorflow takes its sweet, sweet time to set itself up when you first run the network, therefore your measurement of 1.6 seconds is probably 99.9999% setup time and 0.0001% processing your image. Then it does the actual inference calculation, which is probably tiny for one image compared to the setup. A better measurement would be running 1,000 images through it and then 2,000 images and calculate the difference, divided by 1,000 to get how much time each image takes to infer.

From the look of it, Dlib doesn't spend much time with setting up on the first run, but it would still be a better benchmark to do the same as outlined above.

I suspect Tensorflow and Dlib should be fairly similar in terms of execution speed on a CPU because both use optimized linear algebra libraries (BLAS, LAPACK) and there is only so much optimization one can do for matrix multiplication.

There is another thing you might want to give a try though. Most networks use 32 bit floating point calculations for training and inference, but research shows that in most cases, switching over to 8 bit integers for inference doesn't degrade accuracy too much but speeds up inference by a lot.

It is generally better to train a network with later quantization in mind at training, which is not the case here because you use a pre-trained model, but you can still benefit from quantization a lot probably. You can quantize your model with basically running a command that's included in Tensorflow (with the surprising name quantize_graph) but there is a little bit more to it. There is a nice quantization tutorial to follow, but keep in mind that the script is now in tensorflow/tools/quantization and not in contrib any more, as written in the tutorial.

like image 132
Peter Szoldan Avatar answered Sep 24 '22 06:09

Peter Szoldan