I am new to both Python and Tensorflow. I am trying to run the object detection tutorial file from the Tensorflow Object Detection API, but I cannot find where I can get the coordinates of the bounding boxes when objects are detected.
Relevant code:
# The following processing is only for single image detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0]) detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
The place where I assume bounding boxes are drawn is like this:
# Visualization of the results of detection. vis_util.visualize_boxes_and_labels_on_image_array( image_np, output_dict['detection_boxes'], output_dict['detection_classes'], output_dict['detection_scores'], category_index, instance_masks=output_dict.get('detection_masks'), use_normalized_coordinates=True, line_thickness=8) plt.figure(figsize=IMAGE_SIZE) plt.imshow(image_np)
I tried printing output_dict['detection_boxes']
but I am not sure what the numbers mean. There are a lot.
array([[ 0.56213236, 0.2780568 , 0.91445708, 0.69120586], [ 0.56261235, 0.86368728, 0.59286624, 0.8893863 ], [ 0.57073039, 0.87096912, 0.61292225, 0.90354401], [ 0.51422435, 0.78449738, 0.53994244, 0.79437423], ...... [ 0.32784131, 0.5461576 , 0.36972913, 0.56903434], [ 0.03005961, 0.02714229, 0.47211722, 0.44683522], [ 0.43143299, 0.09211366, 0.58121657, 0.3509962 ]], dtype=float32)
I found answers for similar questions, but I don't have a variable called boxes as they do. How can I get the coordinates?
To make coordinates normalized, we take pixel values of x and y, which marks the center of the bounding box on the x- and y-axis. Then we divide the value of x by the width of the image and value of y by the height of the image. width and height represent the width and the height of the bounding box.
A bounding box (usually shortened to bbox) is an area defined by two longitudes and two latitudes, where: Latitude is a decimal number between -90.0 and 90.0. Longitude is a decimal number between -180.0 and 180.0.
I tried printing output_dict['detection_boxes'] but I am not sure what the numbers mean
You can check out the code for yourself. visualize_boxes_and_labels_on_image_array
is defined here.
Note that you are passing use_normalized_coordinates=True
. If you trace the function calls, you will see your numbers [ 0.56213236, 0.2780568 , 0.91445708, 0.69120586]
etc. are the values [ymin, xmin, ymax, xmax]
where the image coordinates:
(left, right, top, bottom) = (xmin * im_width, xmax * im_width, ymin * im_height, ymax * im_height)
are computed by the function:
def draw_bounding_box_on_image(image, ymin, xmin, ymax, xmax, color='red', thickness=4, display_str_list=(), use_normalized_coordinates=True): """Adds a bounding box to an image. Bounding box coordinates can be specified in either absolute (pixel) or normalized coordinates by setting the use_normalized_coordinates argument. Each string in display_str_list is displayed on a separate line above the bounding box in black text on a rectangle filled with the input 'color'. If the top of the bounding box extends to the edge of the image, the strings are displayed below the bounding box. Args: image: a PIL.Image object. ymin: ymin of bounding box. xmin: xmin of bounding box. ymax: ymax of bounding box. xmax: xmax of bounding box. color: color to draw bounding box. Default is red. thickness: line thickness. Default value is 4. display_str_list: list of strings to display in box (each to be shown on its own line). use_normalized_coordinates: If True (default), treat coordinates ymin, xmin, ymax, xmax as relative to the image. Otherwise treat coordinates as absolute. """ draw = ImageDraw.Draw(image) im_width, im_height = image.size if use_normalized_coordinates: (left, right, top, bottom) = (xmin * im_width, xmax * im_width, ymin * im_height, ymax * im_height)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With