Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

About use tf.image.crop_and_resize

I'm working on the ROI pooling layer which work for fast-rcnn and I am used to use tensorflow. I found tf.image.crop_and_resize can act as the ROI pooling layer.

But I try many times and cannot get the result that I expected.Or did the true result is exactly what I got?

here is my code

import cv2
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt 

img_path = r'F:\IMG_0016.JPG'
img = cv2.imread(img_path)
img = img.reshape([1,580,580,3])
img = img.astype(np.float32)
#img = np.concatenate([img,img],axis=0)

img_ = tf.Variable(img) # img shape is [580,580,3]
boxes = tf.Variable([[100,100,300,300],[0.5,0.1,0.9,0.5]])
box_ind = tf.Variable([0,0])
crop_size = tf.Variable([100,100])

#b = tf.image.crop_and_resize(img,[[0.5,0.1,0.9,0.5]],[0],[50,50])
c = tf.image.crop_and_resize(img_,boxes,box_ind,crop_size)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
a = c.eval(session=sess)

plt.imshow(a[0])
plt.imshow(a[1])

And I handed in my origin img and result:a0,a1
if I was wrong can anyone teach me how to use this function? thanks.

like image 241
R.igo Avatar asked Jan 27 '23 14:01

R.igo


2 Answers

Actually, there's no problem with Tensorflow here.

From the doc of tf.image.crop_and_resize (emphasis is mine) :

boxes: A Tensor of type float32. A 2-D tensor of shape [num_boxes, 4]. The i-th row of the tensor specifies the coordinates of a box in the box_ind[i] image and is specified in normalized coordinates [y1, x1, y2, x2]. A normalized coordinate value of y is mapped to the image coordinate at y * (image_height - 1), so as the [0, 1] interval of normalized image height is mapped to [0, image_height - 1] in image height coordinates. We do allow y1 > y2, in which case the sampled crop is an up-down flipped version of the original image. The width dimension is treated similarly. Normalized coordinates outside the [0, 1] range are allowed, in which case we use extrapolation_value to extrapolate the input image values.

The boxes argument needs normalized coordinates. That's why you get a black box with your first set of coordinates [100,100,300,300] (not normalized, and no extrapolation value provided), and not with your second set [0.5,0.1,0.9,0.5].

However, as that why matplotlib show you gibberish on your second attempt, it's just because you're using the wrong datatype. Quoting the matplotlib documentation of plt.imshow (emphasis is mine):

All values should be in the range [0 .. 1] for floats or [0 .. 255] for integers. Out-of-range values will be clipped to these bounds.

As you're using float outside the [0,1] range, matplotlib is bounding your values to 1. That's why you get those colored pixels (either solid red, solid green or solid blue, or a mixing of these). Cast your array to uint_8 to get an image that make sense.

plt.imshow( a[1].astype(np.uint8))

Edit : As requested, I will dive a bit more into tf.image.crop_and_resize.

[When providing non normalized coordinates and no extrapolation values], why I just get a blank result?

Quoting the doc :

Normalized coordinates outside the [0, 1] range are allowed, in which case we use extrapolation_value to extrapolate the input image values.

So, normalized coordinates outside [0,1] are allowed. But they still need to be normalized ! With your example, [100,100,300,300], the coordinates you provide makes the red square. Your original image is the little green dot in the upper left corner! The default value of the argument extrapolation_value is 0, so the values outside the frame of the original image are inferred as [0,0,0] hence the black.
crop_and_resize

But if your usecase needs another value, you can provide it. The pixels will take a RGB value of extrapolation_value%256 on each channel. This option is useful if the zone you need to crop is not fully included in you original images. (A possible usecase would be sliding windows for example).

like image 163
Lescurel Avatar answered Jan 30 '23 04:01

Lescurel


It seems that tf.image.crop_and_resize expects pixel values in the range [0,1].

Changing your code to

test = tf.image.crop_and_resize(image=image_np_expanded/255., ...)

solved the problem for me.

like image 21
Krishna Choudhary Avatar answered Jan 30 '23 04:01

Krishna Choudhary