Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is the IoU calculated for multiple bounding box predictions in Tensorflow Object Detection API?

How is the IoU metric calculated for multiple bounding box predictions in Tensorflow Object Detection API ?

like image 592
Nagarjun Gururaj Avatar asked Nov 16 '22 02:11

Nagarjun Gururaj


1 Answers

Not sure exactly how TensorFlow does it but here is one way that I recently got it to work since I didn't find a good solution online. I used numpy matrices to get the IoU, & other metrics (TP, FP, TN, FN) for multi-object detection.

Lets say for this example that your image is 6x6.

import cv2

empty_array = np.zeros(36).reshape([6, 6])

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

And you have the ground truth for 2 objects, one in the bottom left of the image and one smaller one in the top right.

bbox_actual_obj1 = [[0, 3], [2, 5]] # top left coord & bottom right coord
bbox_actual_obj2 = [[4, 0], [5, 1]]

Using OpenCV, you can add these objects to a copy of the empty image array.

actual = empty.copy()
actual = cv2.rectangle(
    actual,
    bbox_actual_obj1[0],
    bbox_actual_obj1[1],
    1,
    -1
)
actual = cv2.rectangle(
    actual,
    bbox_actual_obj2[0],
    bbox_actual_obj2[1],
    1,
    -1
)

array([[0., 0., 0., 0., 1., 1.],
       [0., 0., 0., 0., 1., 1.],
       [0., 0., 0., 0., 0., 0.],
       [1., 1., 1., 0., 0., 0.],
       [1., 1., 1., 0., 0., 0.],
       [1., 1., 1., 0., 0., 0.]])

Now let's say that below are our predicted bounding boxes:

bbox_pred_obj1 = [[1, 3], [3, 5]] # top left coord & bottom right coord
bbox_pred_obj2 = [[3, 0], [5, 2]]

Now we do the same thing as above but change the value we assign within the array.

pred = empty.copy()
pred = cv2.rectangle(
    pred,
    bbox_person2_car1[0],
    bbox_person2_car1[1],
    2,
    -1
)
pred = cv2.rectangle(
    pred,
    bbox_person2_car2[0],
    bbox_person2_car2[1],
    2,
    -1
)

array([[0., 0., 0., 2., 2., 2.],
       [0., 0., 0., 2., 2., 2.],
       [0., 0., 0., 2., 2., 2.],
       [0., 2., 2., 2., 0., 0.],
       [0., 2., 2., 2., 0., 0.],
       [0., 2., 2., 2., 0., 0.]])

If we convert these arrays to matrices and add them, we get the following result

actual_matrix = np.matrix(actual)
pred_matrix = np.matrix(pred)
combined = actual_matrix + pred_matrix

matrix([[0., 0., 0., 2., 3., 3.],
        [0., 0., 0., 2., 3., 3.],
        [0., 0., 0., 2., 2., 2.],
        [1., 3., 3., 2., 0., 0.],
        [1., 3., 3., 2., 0., 0.],
        [1., 3., 3., 2., 0., 0.]])

Now all we need to do is count the amount of each number in the combined matrix to get the TP, FP, TN, FN rates.

combined = np.squeeze(
    np.asarray(
        pred_matrix + actual_matrix
    )
)
unique, counts = np.unique(combined, return_counts=True)
zipped = dict(zip(unique, counts))

{0.0: 15, 1.0: 3, 2.0: 8, 3.0: 10}

Legend:

  • True Negative: 0
  • False Negative: 1
  • False Positive: 2
  • True Positive/Intersection: 3
  • Union: 1 + 2 + 3

IoU: 0.48 10/(3 + 8 + 10)
Precision: 0.56 10/(10 + 8)
Recall: 0.77 10/(10 + 3)
F1: 0.65 10/(10 + 0.5 * (3 + 8))

like image 61
Austin Ulfers Avatar answered Apr 26 '23 01:04

Austin Ulfers