I recently switched out using cv2 for Tensorflow's tf.image module for image processing. However, my validation accuracy dropped around 10%.
I believe the issue is related to
While these differences result in worse accuracy, the images seem to be human-indistinguishable when using plt.imshow(). For example, take Image #1 of the ImageNet Validation Dataset:
First issue:
However, after converting the tf tensor to BGR format, there are very slight differences at many pixels in the image.
Using tf.image.decode_jpeg and then converting to BGR
[[ 26 41 24 ..., 57 48 46]
[ 36 39 36 ..., 24 24 29]
[ 41 26 34 ..., 11 17 27]
...,
[ 71 67 61 ..., 106 105 100]
[ 66 63 59 ..., 106 105 101]
[ 64 66 58 ..., 106 105 101]]```
Using cv.imread
[[ 26 42 24 ..., 57 48 48]
[ 38 40 38 ..., 26 27 31]
[ 41 28 36 ..., 14 20 31]
...,
[ 72 67 60 ..., 108 105 102]
[ 65 63 58 ..., 107 107 103]
[ 65 67 60 ..., 108 106 102]]```
Second issue:
tf.image.resize_images
[[ 26. 25.41850281 35.73127747 ..., 81.85855103
59.45834351 49.82373047]
[ 38.33480072 32.90485001 50.90826797 ..., 86.28446198
74.88543701 20.16353798]
[ 51.27312469 26.86172867 39.52401352 ..., 66.86851501
81.12111664 33.37636185]
...,
[ 70.59472656 75.78851318
45.48100662 ..., 70.18637085
88.56777191 97.19295502]
[ 70.66964722 59.77249908 48.16699219 ..., 74.25527954
97.58244324 105.20263672]
[ 64.93395996 59.72298431 55.17600632 ..., 77.28720856
98.95108032 105.20263672]]```
cv2.resize
[[ 36 30 34 ..., 102 59 43]
[ 35 28 51 ..., 85 61 26]
[ 28 39 50 ..., 59 62 52]
...,
[ 75 67 34 ..., 74 98 101]
[ 67 59 43 ..., 86 102 104]
[ 66 65 48 ..., 86 103 105]]```
Here's a gist demonstrating the behavior just mentioned. It includes the full code for how I am processing the image.
So my main questions are:
Thank you!
As vijay m points out correctly, by changing the dct_method
to "INTEGER_ACCURATE" you will get the same uint8 image using cv2 or tf. The problem indeed seems to be the resizing method. I also tried to force Tensorflow to use the same interpolation method than cv2 uses by default (bilinear) but the results are still different. This might be the case, because cv2 does the interpolation on integer values and TensorFlow converts to float before interpolating. But this is only a guess. If you plot the pixel-wise difference between the resized image by TF and cv2 you'll get the following historgram:
Histrogramm of pixel-wise difference
As you can see, this looks pretty normal distributed. (Also I was surprised amount of pixel-wise difference). The problem of your accuracy drop could lie exactly here. In this paper Goodfellow et al. describe the effect of adversarial examples and classification systems. This problem here is something similar I think. If the original weights you use for your network were trained using some input pipeline which gives the results of the cv2 functions, the image from the TF input pipeline is something like an adversarial example.
(See the image on page 3 at the top for an example...I can't post more than two links.)
So in the end I think if you want to use the original network weights for the same data they trained the network on, you should stay with a similar/same input pipeline. If you use the weights to finetune the network on your own data, this should not be of a big concern, because you retrain the classification layer to work with the new input images (from the TF pipeline).
And @ Ishant Mrinal: Please have a look at the code the OP provided in the GIST. He is aware of the difference of BGR (cv2) and RGB (TF) and is converting the images to the same color space.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With