Logo Questions Linux Laravel Mysql Ubuntu Git Menu

How to get object rect/coordinates from VNClassificationObservation

have an issue with getting from VNClassificationObservation.

My goal id to recognize the object and display popup with the object name, I'm able to get name but I can't get object coordinates or frame.

Here is code:

let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: requestOptions)
do {
    try handler.perform([classificationRequest, detectFaceRequest])
} catch {

Then I handle

func handleClassification(request: VNRequest, error: Error?) {
      guard let observations = request.results as? [VNClassificationObservation] else {
          fatalError("unexpected result type from VNCoreMLRequest")

    // Filter observation
    let filteredOservations = observations[0...10].filter({ $0.confidence > 0.1 })

    // Update UI
   DispatchQueue.main.async { [weak self] in

    for  observation in filteredOservations {
            print("observation: ",observation.identifier)
            //HERE: I need to display popup with observation name


lazy var classificationRequest: VNCoreMLRequest = {

    // Load the ML model through its generated class and create a Vision request for it.
    do {
        let model = try VNCoreMLModel(for: Inceptionv3().model)
        let request = VNCoreMLRequest(model: model, completionHandler: self.handleClassification)
        request.imageCropAndScaleOption = VNImageCropAndScaleOptionCenterCrop
        return request
    } catch {
        fatalError("can't load Vision ML model: \(error)")
like image 825
Svitlana Avatar asked Jun 22 '17 16:06


2 Answers

A pure classifier model can only answer "what is this a picture of?", not detect and locate objects in the picture. All the free models on the Apple developer site (including Inception v3) are of this kind.

When Vision works with such a model, it identifies the model as a classifier based on the outputs declared in the MLModel file, and returns VNClassificationObservation objects as output.

If you find or create a model that's trained to both identify and locate objects, you can still use it with Vision. When you convert that model to Core ML format, the MLModel file will describe multiple outputs. When Vision works with a model that has multiple outputs, it returns an array of VNCoreMLFeatureValueObservation objects — one for each output of the model.

How the model declares its outputs would determine which feature values represent what. A model that reports a classification and a bounding box could output a string and four doubles, or a string and a multi array, etc.

Addendum: Here's a model that works on iOS 11 and returns VNCoreMLFeatureValueObservation: TinyYOLO

like image 78
rickster Avatar answered Nov 15 '22 06:11


That's because classifiers do not return objects coordinates or frames. A classifier only gives a probability distribution over a list of categories.

What model are you using here?

like image 24
Matthijs Hollemans Avatar answered Nov 15 '22 05:11

Matthijs Hollemans