Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the darkest region in a depth map using numpy and/or cv2

I am attempting to consistently find the darkest region in a series of depth map images generated from a video. The depth maps are generated using the PyTorch implementation here

Their sample run script generates a prediction of the same size as the input where each pixel is a floating point value, with the highest/brightest value being the closest. Standard depth estimation using ConvNets.

The depth prediction is then normalized as follows to make a png for review

bits = 2
depth_min = prediction.min() 
depth_max = prediction.max()

max_val = (2**(8*bits))-1

out = max_val * (prediction - depth_min) / (depth_max - depth_min)

I am attempting to identify the darkest region in each image in the video, with the assumption that this region has the most "open space".

I've tried several methods:

  • cv2 template matching

Using cv2 template matching and minMaxLoc I created a template of np.zeros(100,100), then applied the template similar to the docs

img2 = out.copy().astype("uint8")
template = np.zeros((100, 100)).astype("uint8")
w, h = template.shape[::-1]

res = cv2.matchTemplate(img2,template,cv2.TM_SQDIFF)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)
top_left = min_loc
bottom_right = (top_left[0] + w, top_left[1] + h)

val = out.max()
cv2.rectangle(out,top_left, bottom_right, int(val) , 2)

As you can see, this implementation is very inconsistent with many false positives

enter image description here

  • np.argmin

Using np.argmin(out, axis=1) which generates many indices. I take the first two, and write the word MIN at those coordinates

text = "MIN"
textsize = cv2.getTextSize(text, font, 1, 2)[0] 
textX, textY = np.argmin(prediction, axis=1)[:2]
cv2.putText(out, text, (textX, textY), font, 1, (int(917*max_val), int(917*max_val), int(917*max_val)), 2)

np.argmin with a hack

This is less inconsistent but still lacking

  • np.argwhere

Using np.argwhere(prediction == np.min(preditcion) then write the word MIN at the coordanites. I imagined this would give me the darkest pixel on the image, but this is not the case

enter image description here

I've also thought of running a convolution operation with a kernel of 50x50, then taking the region with the smallest value as the darkest region

My question is why are there inconsistencies and false positives. How can I fix that? Intuitively this seems like a very simple thing to do.

UPDATE Thanks to Hans for the idea. Please follow this link to download the output depths in png format.

like image 298
Sam Hammamy Avatar asked Oct 14 '22 22:10

Sam Hammamy


1 Answers

The minimum is not a single point but as a rule a larger area. argmin finds the first x and y (top left corner) of this area:

In case of multiple occurrences of the minimum values, the indices corresponding to the first occurrence are returned.

What you need is the center of this minimum region. You can find it using moments. Sometimes you have multiple minimum regions for instance in frame107.png. In this case we take the biggest one by finding the contour with the largest area.

We still have some jumping markers as sometimes you have a tiny area that is the minimum, e.g. in frame25.png. Therefore we use a minimum area threshold min_area, i.e. we don't use the absolute minimum region but the region with the smallest value from all regions greater or equal that threshold.

import numpy as np
import cv2
import glob

min_area = 500

for file in glob.glob("*.png"):
    img = cv2.imread(file, cv2.IMREAD_GRAYSCALE)
    for i in range(img.min(), 255):
        if np.count_nonzero(img==i) >= min_area:
            b = np.where(img==i, 1, 0).astype(np.uint8)
            break
    contours,_ = cv2.findContours(b, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
    max_contour = max(contours, key=cv2.contourArea)
    m = cv2.moments(max_contour)
    x = int(m["m10"] / m["m00"])
    y = int(m["m01"] / m["m00"])
    out = cv2.circle(img, (x,y), 10, 255, 2 )
    cv2.imwrite(file,out)

frame107 with five regions where the image is 0 shown with enhanced gamma: enter image description here

frame25 with very small min region (red arrow), we take the fifth largest min region instead (white cirle): enter image description here

The result (for min_area=500) is still a bit jumpy at some places, but if you further increase min_area you'll get false results for frames with a very steeply descending (and hence small per value) dark area. Maybe you can use the time axis (frame number) to filter out frames where the location of the darkest region jumps back and forth within 3 frames.

output for min_area=500

like image 136
Stef Avatar answered Oct 18 '22 15:10

Stef