Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does MinMaxScaler add lines to image?

I want to normalize the pixel values of an image to the range [0, 1] for each channel (R, G, B).

Minimal Example

#!/usr/bin/env python

import numpy as np
import scipy
from sklearn import preprocessing

original = scipy.misc.imread('Crocodylus-johnsoni-3.jpg')
scipy.misc.imshow(original)

transformed = np.zeros(original.shape, dtype=np.float64)

scaler = preprocessing.MinMaxScaler()
for channel in range(3):
    transformed[:, :, channel] = scaler.fit_transform(original[:, :, channel])
scipy.misc.imsave("transformed.jpg", transformed)

What happens

Taking https://commons.wikimedia.org/wiki/File:Crocodylus-johnsoni-3.jpg, I get the following "normalized" result:

enter image description here

As you can see there are lines from top to bottom at the right side. What happened there? It seems to me that the normalization went wrong. If so: How do I fix it?

like image 616
Martin Thoma Avatar asked Feb 07 '23 19:02

Martin Thoma


1 Answers

In scikit-learn, a two-dimensional array with shape (m, n) is usually interpreted as a collection of m samples, with each sample having n features.

MinMaxScaler.fit_transform() transforms each feature, so each column of your array is transformed independently of the others. That results in the vertical "stripes" in the image.

It looks like you intended to scale each color channel independently. To do that using MinMaxScaler, reshape the input so that each channel becomes one column. That is, if the original image has shape (m, n, 3), reshape it to (m*n, 3) before passing it to the fit_transform() method, and then restore the shape of the result to create the transformed array.

For example,

ascolumns = original.reshape(-1, 3)
t = scaler.fit_transform(ascolumns)
transformed = t.reshape(original.shape)

With this, transformed looks like this:

image

The image looks exactly like the original, because it turns out that in the array original, the minimum and maximum are 0 and 255, respectively, in each channel:

In [41]: original.min(axis=(0, 1))
Out[41]: array([0, 0, 0], dtype=uint8)

In [42]: original.max(axis=(0, 1))
Out[42]: array([255, 255, 255], dtype=uint8)

So all fit_transform does in this case is transform all the input values to the floating point range [0.0, 1.0] uniformly. If the minimum or maximum was different in one of the channels, the transformed image would look different.


By the way, it is not difficult to perform the transform using pure numpy. (I'm using Python 3, so in the following, the division automatically casts the result to floating point. If you are using Python 2, you'll need to convert one of the argument to floating point, or use from __future__ import division.)

In [58]: omin = original.min(axis=(0, 1), keepdims=True)

In [59]: omax = original.max(axis=(0, 1), keepdims=True)

In [60]: xformed = (original - omin)/(omax - omin)

In [61]: np.allclose(xformed, transformed)
Out[61]: True

(One potential problem with that method is that it will generate an error if one of the channels is constant, because then one of the values in omax - omin will be 0.)

like image 114
Warren Weckesser Avatar answered Mar 05 '23 14:03

Warren Weckesser