Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Face Detection with Camera

How can I do face detection in realtime just as "Camera" does?

enter image description here

I noticed that AVCaptureStillImageOutput is deprecated after 10.0, so I use AVCapturePhotoOutput instead. However, I found that the image I saved for facial detection is not so satisfied? Any ideas?


UPDATE

After giving a try of @Shravya Boggarapu mentioned. Currently, I use AVCaptureMetadataOutput to detect the face without CIFaceDetector. It works as expected. However, when I'm trying to draw bounds of the face, it seems mislocated. Any idea?

enter image description here

let metaDataOutput = AVCaptureMetadataOutput()  captureSession.sessionPreset = AVCaptureSessionPresetPhoto     let backCamera = AVCaptureDevice.defaultDevice(withDeviceType: .builtInWideAngleCamera, mediaType: AVMediaTypeVideo, position: .back)     do {         let input = try AVCaptureDeviceInput(device: backCamera)          if (captureSession.canAddInput(input)) {             captureSession.addInput(input)              // MetadataOutput instead             if(captureSession.canAddOutput(metaDataOutput)) {                 captureSession.addOutput(metaDataOutput)                  metaDataOutput.setMetadataObjectsDelegate(self, queue: DispatchQueue.main)                 metaDataOutput.metadataObjectTypes = [AVMetadataObjectTypeFace]                  previewLayer = AVCaptureVideoPreviewLayer(session: captureSession)                 previewLayer?.frame = cameraView.bounds                 previewLayer?.videoGravity = AVLayerVideoGravityResizeAspectFill                  cameraView.layer.addSublayer(previewLayer!)                 captureSession.startRunning()             }          }      } catch {         print(error.localizedDescription)     } 

and

extension CameraViewController: AVCaptureMetadataOutputObjectsDelegate { func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputMetadataObjects metadataObjects: [Any]!, from connection: AVCaptureConnection!) {     if findFaceControl {         findFaceControl = false         for metadataObject in metadataObjects {             if (metadataObject as AnyObject).type == AVMetadataObjectTypeFace {                 print("😇😍😎")                 print(metadataObject)                 let bounds = (metadataObject as! AVMetadataFaceObject).bounds                 print("origin x: \(bounds.origin.x)")                 print("origin y: \(bounds.origin.y)")                 print("size width: \(bounds.size.width)")                 print("size height: \(bounds.size.height)")                 print("cameraView width: \(self.cameraView.frame.width)")                 print("cameraView height: \(self.cameraView.frame.height)")                 var face = CGRect()                 face.origin.x = bounds.origin.x * self.cameraView.frame.width                 face.origin.y = bounds.origin.y * self.cameraView.frame.height                 face.size.width = bounds.size.width * self.cameraView.frame.width                 face.size.height = bounds.size.height * self.cameraView.frame.height                 print(face)                  showBounds(at: face)             }         }     }  } } 

Original

see in Github

var captureSession = AVCaptureSession() var photoOutput = AVCapturePhotoOutput() var previewLayer: AVCaptureVideoPreviewLayer?      override func viewWillAppear(_ animated: Bool) {     super.viewWillAppear(true)      captureSession.sessionPreset = AVCaptureSessionPresetHigh      let backCamera = AVCaptureDevice.defaultDevice(withMediaType: AVMediaTypeVideo)     do {         let input = try AVCaptureDeviceInput(device: backCamera)          if (captureSession.canAddInput(input)) {             captureSession.addInput(input)              if(captureSession.canAddOutput(photoOutput)){                 captureSession.addOutput(photoOutput)                 captureSession.startRunning()                  previewLayer = AVCaptureVideoPreviewLayer(session: captureSession)                 previewLayer?.videoGravity = AVLayerVideoGravityResizeAspectFill                 previewLayer?.frame = cameraView.bounds                  cameraView.layer.addSublayer(previewLayer!)             }         }      } catch {         print(error.localizedDescription)     }  }  func captureImage() {     let settings = AVCapturePhotoSettings()     let previewPixelType = settings.availablePreviewPhotoPixelFormatTypes.first!     let previewFormat = [kCVPixelBufferPixelFormatTypeKey as String: previewPixelType                          ]     settings.previewPhotoFormat = previewFormat     photoOutput.capturePhoto(with: settings, delegate: self)  }    func capture(_ captureOutput: AVCapturePhotoOutput, didFinishProcessingPhotoSampleBuffer photoSampleBuffer: CMSampleBuffer?, previewPhotoSampleBuffer: CMSampleBuffer?, resolvedSettings: AVCaptureResolvedPhotoSettings, bracketSettings: AVCaptureBracketedStillImageSettings?, error: Error?) {     if let error = error {         print(error.localizedDescription)     }     // Not include previewPhotoSampleBuffer     if let sampleBuffer = photoSampleBuffer,         let dataImage = AVCapturePhotoOutput.jpegPhotoDataRepresentation(forJPEGSampleBuffer: sampleBuffer, previewPhotoSampleBuffer: nil) {             self.imageView.image = UIImage(data: dataImage)             self.imageView.isHidden = false             self.previewLayer?.isHidden = true             self.findFace(img: self.imageView.image!)         } } 

The findFace works with normal image. However, the image I capture via camera will not work or sometimes only recognize one face.

Normal Image

enter image description here

Capture Image

enter image description here

func findFace(img: UIImage) {     guard let faceImage = CIImage(image: img) else { return }     let accuracy = [CIDetectorAccuracy: CIDetectorAccuracyHigh]     let faceDetector = CIDetector(ofType: CIDetectorTypeFace, context: nil, options: accuracy)       // For converting the Core Image Coordinates to UIView Coordinates     let detectedImageSize = faceImage.extent.size     var transform = CGAffineTransform(scaleX: 1, y: -1)     transform = transform.translatedBy(x: 0, y: -detectedImageSize.height)       if let faces = faceDetector?.features(in: faceImage, options: [CIDetectorSmile: true, CIDetectorEyeBlink: true]) {         for face in faces as! [CIFaceFeature] {              // Apply the transform to convert the coordinates             var faceViewBounds =  face.bounds.applying(transform)             // Calculate the actual position and size of the rectangle in the image view             let viewSize = imageView.bounds.size             let scale = min(viewSize.width / detectedImageSize.width,                             viewSize.height / detectedImageSize.height)             let offsetX = (viewSize.width - detectedImageSize.width * scale) / 2             let offsetY = (viewSize.height - detectedImageSize.height * scale) / 2              faceViewBounds = faceViewBounds.applying(CGAffineTransform(scaleX: scale, y: scale))             print("faceBounds = \(faceViewBounds)")             faceViewBounds.origin.x += offsetX             faceViewBounds.origin.y += offsetY              showBounds(at: faceViewBounds)         }          if faces.count != 0 {             print("Number of faces: \(faces.count)")         } else {             print("No faces 😢")         }     }   }  func showBounds(at bounds: CGRect) {     let indicator = UIView(frame: bounds)     indicator.frame =  bounds     indicator.layer.borderWidth = 3     indicator.layer.borderColor = UIColor.red.cgColor     indicator.backgroundColor = .clear      self.imageView.addSubview(indicator)     faceBoxes.append(indicator)  } 
like image 844
Willjay Avatar asked Dec 28 '16 02:12

Willjay


1 Answers

There are two ways to detect faces: CIFaceDetector and AVCaptureMetadataOutput. Depending on your requirements, choose what is relevant for you.

CIFaceDetector has more features, it gives you the location of the eyes and mouth, a smile detector, etc.

On the other hand, AVCaptureMetadataOutput is computed on the frames and the detected faces are tracked and there is no extra code to be added by us. I find that, because of tracking. faces are detected more reliably in this process. The downside of this is that you will simply detect faces, no the position of the eyes or mouth. Another advantage of this method is that orientation issues are smaller as you can use videoOrientation whenever the device orientation changes and the orientation of the faces will relative to that orientation.

In my case, my application uses YUV420 as the required format so using CIDetector (which works with RGB) in real-time was not viable. Using AVCaptureMetadataOutput saved a lot of effort and performed more reliably due to continuous tracking.

Once I had the bounding box for the faces, I coded extra features, such as skin detection and applied it on the still image.

Note: When you capture a still image, the face box information is added along with the metadata so there are no sync issues.

You can also use a combination of the two to get better results.

Explore and evaluate the pros and cons as per your application.


The face rectangle is wrt image origin. So, for the screen, it may be different. Use:

for (AVMetadataFaceObject *faceFeatures in metadataObjects) {     CGRect face = faceFeatures.bounds;     CGRect facePreviewBounds = CGRectMake(face.origin.y * previewLayerRect.size.width,                                face.origin.x * previewLayerRect.size.height,                                face.size.width * previewLayerRect.size.height,                                face.size.height * previewLayerRect.size.width);      /* Draw rectangle facePreviewBounds on screen */ } 
like image 132
Shravya Boggarapu Avatar answered Sep 23 '22 21:09

Shravya Boggarapu