Drawing a cross on an image with OpenCV

Tags:

Context: I am performing Object Localisation and wanting to implement an Inhibition of Return mechanism (i.e. drawing a black cross on the image where the red bounding box is after a trigger action.)

Problem: I do not know how to accurately scale the bounding box (red) in relation to the original input (init_input). If this scaling is understood, then the black cross should be accurately placed in the middle of the red bounding box.

My current code for this function is as follows:

def IoR(b, init_input, prev_coord):
    """
    Inhibition-of-Return mechanism.

    Marks the region of the image covered by
    the bounding box with a black cross.

    :param b:
        The current bounding box represented as [x1, y1, x2, y2].

    :param init_input:
        The initial input volume of the current episode.

    :param prev_coord:
        The previous state's bounding box coordinates (x1, y1, x2, y2)
    """
    x1, y1, x2, y2 = prev_coord
    width = 12
    x_mid = (b[2] + b[0]) // 2
    y_mid = (b[3] + b[1]) // 2

    # Define vertical rectangle coordinates
    ver_x1 = int(((x_mid) * IMG_SIZE / (x2 - x1)) - width)
    ver_x2 = int(((x_mid) * IMG_SIZE / (x2 - x1)) + width)
    ver_y1 = int((b[1]) * IMG_SIZE / (y2 - y1))
    ver_y2 = int((b[3]) * IMG_SIZE / (y2 - y1))

    # Define horizontal rectangle coordinates
    hor_x1 = int((b[0]) * IMG_SIZE / (x2 - x1))
    hor_x2 = int((b[2]) * IMG_SIZE / (x2 - x1))
    hor_y1 = int(((y_mid) * IMG_SIZE / (y2 - y1)) - width)
    hor_y2 = int(((y_mid) * IMG_SIZE / (y2 - y1)) + width)

    # Draw vertical rectangle
    cv2.rectangle(init_input, (ver_x1, ver_y1), (ver_x2, ver_y2), (0, 0, 0), -1)

    # Draw horizontal rectangle
    cv2.rectangle(init_input, (hor_x1, hor_y1), (hor_x2, hor_y2), (0, 0, 0), -1)

The desired effect can be seen below:

Desired

Note: I believe the complexity in this problem arises due to the image being resized (to 224, 224, 3) each time I take an action (and consequently move onto the next state). Therefore, the "anchor" to determine the scaling must be extracted from the previous states scaling, which is shown in the following code:

def next_state(init_input, b_prime, g):
    """
    Returns the observable region of the next state.

    Formats the next state's observable region, defined
    by b_prime, to be of dimension (224, 224, 3). Adding 16
    additional pixels of context around the original bounding box.
    The ground truth box must be reformatted according to the
    new observable region.

    IMG_SIZE = 224

    :param init_input:
        The initial input volume of the current episode.

    :param b_prime:
        The subsequent state's bounding box.

    :param g: (init_g)
        The initial ground truth box of the target object.
    """

    # Determine the pixel coordinates of the observable region for the following state
    context_pixels = 16
    x1 = max(b_prime[0] - context_pixels, 0)
    y1 = max(b_prime[1] - context_pixels, 0)
    x2 = min(b_prime[2] + context_pixels, IMG_SIZE)
    y2 = min(b_prime[3] + context_pixels, IMG_SIZE)

    # Determine observable region
    observable_region = cv2.resize(init_input[y1:y2, x1:x2], (224, 224), interpolation=cv2.INTER_AREA)

    # Resize ground truth box
    g[0] = int((g[0] - x1) * IMG_SIZE / (x2 - x1))  # x1
    g[1] = int((g[1] - y1) * IMG_SIZE / (y2 - y1))  # y1
    g[2] = int((g[2] - x1) * IMG_SIZE / (x2 - x1))  # x2
    g[3] = int((g[3] - y1) * IMG_SIZE / (y2 - y1))  # y2

    return observable_region, g, (b_prime[0], b_prime[1], b_prime[2], b_prime[3])

Explanation:

There is a state t in which the agent is predicting the location of the target object. The target object has a ground truth box (yellow in image, dotted in sketch), and the agent's current "localising box" is the red bounding box. Say, at state t the agent decides it is best to move right. Consequently, the bounding box is moved to the right, and then the next state, t' is determined by adding an additional 16 pixels of context around the red bounding box, cropping the original image with respect to this boundary, and then upscaling the cropped image back to 224, 224 in dimensions.

Say the agent is now confident that its prediction is accurate, so it chooses the trigger action. This basically means, end the current target object's localisation episode and place a black cross on where the agent predicted the object was (i.e. in the middle of the red bounding box). Now, since the current state is zoomed in after being cropped following the previous action, the bounding box must be re-scaled with respect to the normal/original/initial image and then the black cross can be drawn accurately onto the image.

In the context of my problem, the first rescaling between states is working perfectly well (the second code in this post). However, scaling back to normal and drawing the black cross is what I cannot seem to get my head around.

Here is an image which hopefully helps the explanation:

Sketch example

Here is the output of my current solution (please click the image to zoom in):

Image example

204

asked Jul 21 '18 11:07

Wizard

2 Answers

I think it's better to save the coordinate globally instead of using a bunch of upscale/downscale. They give me headache and there might be loss of precision due to rounding.

That is, every time you detect something, you convert it to global (original image) coordinate first. I have written a small demo here, imitating your detection and trigger behavior.

Initial detection: enter image description here

Zoomed in, another detection: enter image description here

Zoomed back to original scale, with the detection box in the correct location enter image description here

Code:

import cv2
import matplotlib.pyplot as plt

IMG_SIZE = 224

im = cv2.cvtColor(cv2.imread('lena.jpg'), cv2.COLOR_BGR2GRAY)
im = cv2.resize(im, (IMG_SIZE, IMG_SIZE))

# Your detector results
detected_region = [
    [(10, 20)   , (80, 100)],
    [(50, 0)    , (220, 190)],
    [(100, 143)  , (180, 200)],
    [(110, 45)  , (180, 150)]
]

# Global states
x_scale = 1.0
y_scale = 1.0
x_shift = 0
y_shift = 0

x1, y1 = 0, 0
x2, y2 = IMG_SIZE-1, IMG_SIZE-1
for region in detected_region:
    # Detection
    x_scale = IMG_SIZE / (x2-x1)
    y_scale = IMG_SIZE / (y2-y1)
    x_shift = x1
    y_shift = y1

    cur_im = cv2.resize(im[y1:y2, x1:x2], (IMG_SIZE, IMG_SIZE))

    # Assuming the detector return these results
    cv2.rectangle(cur_im, region[0], region[1], (255))

    plt.imshow(cur_im)
    plt.show()

    # Zooming in, using part of your code
    context_pixels = 16
    x1 = max(region[0][0] - context_pixels, 0) / x_scale + x_shift
    y1 = max(region[0][1] - context_pixels, 0) / y_scale + y_shift
    x2 = min(region[1][0] + context_pixels, IMG_SIZE) / x_scale + x_shift
    y2 = min(region[1][1] + context_pixels, IMG_SIZE) / y_scale + y_shift

    x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)


# Assuming the detector confirm its choice here
print('Confirmed detection: ', x1, y1, x2, y2)

# This time no padding
x1 = detected_region[-1][0][0] / x_scale + x_shift
y1 = detected_region[-1][0][1] / y_scale + y_shift
x2 = detected_region[-1][1][0] / x_scale + x_shift
y2 = detected_region[-1][1][1] / y_scale + y_shift
x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)

cv2.rectangle(im, (x1, y1), (x2, y2), (255, 0, 0))
plt.imshow(im)
plt.show()

This also prevents resizing on a resized image which might create more artifacts and worsen the detector's performance.

answered Oct 18 '22 00:10

hkchengrex

Imagine a point (x, y) in a 500x500 image. Let it be (100, 200). After scaling it to a different size, say 250x250 - the correct way to scale it would be to just look at the current co-ordinate and do new_coord = old_coord * NEW_SIZE/OLD_SIZE.

Thus, (100,200) will be transformed to (50,100)

If you replace your scaling using x2-x1 and use a simpler rescaling formula, it should fix your problem.

Update: NEW_SIZE and OLD_SIZE may be different for the two co-ordinates based on the shape of the original image and final image, if they are rectangular and not square.

answered Oct 18 '22 02:10

doodhwala

Related questions
                            
                                types.MethodType third argument in python2
                            
                                Module can't be found when called from outside
                            
                                How to handle multiple results from a coroutine function?
                            
                                pandas Categorical error: "Cannot setitem on a Categorical with a new category, set the categories first"
                            
                                Sentence Structure identification - spacy
                            
                                Changing activation function of a keras layer w/o replacing whole layer
                            
                                Write csv file and save it into S3 using AWS Lambda (python)
                            
                                Anaconda not available in PyCharm
                            
                                Performance of bytearray and alternatives
                            
                                Get regex group with fuzziness
                            
                                0% accuracy with evaluate_generator but 75% accuracy during training with same data - what is going on?
                            
                                ACCESS_REFUSED - Login was refused using authentication mechanism AMQPLAIN. For details see the broker logfile
                            
                                Inlining Python Function
                            
                                How to prevent Django 1.11 from creating migrations for unmanaged models?
                            
                                Is there a way to create read only dashboard in Apache Superset
                            
                                Python livewires resize screen
                            
                                venv vs virtualenv - Why does venv not use the current pip and setuptools? [duplicate]
                            
                                Python selenium: selenium.common.exceptions.NoSuchWindowException: Message: Browsing context has been discarded
                            
                                Pandas: join on partial string match, like Excel VLOOKUP
                            
                                DNS Request over SOCKS5 using Python 3

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Drawing a cross on an image with OpenCV

Tags:

python

image-processing

opencv