I don't have much knowledge of image processing. I am trying to implement a ConvNet. I downloaded some images as data set and made their height and width equal. Then I tried loading them into np.array by this code:
train_list = glob.glob('A:\Code\Machine
Learning\CNN\ConvolutionalNN1\TrainImg\*.jpg')
X_train_orig = np.array([np.array(Image.open(file)) for file in train_list])
But it gave me error that cannot broadcast (420,310) to (420,310,3). Then I printed the shape of array, some were (420,310,3) others (410,320,4). Why is so? And how can I change that to fit it in array?
A picture that has or appears to have height, width and depth is three-dimensional (or 3-D). A picture that has height and width but no depth is two-dimensional (or 2-D).
New research has shown that of all the possible dimensional realities, only those of three or seven dimensions would survive in an expanding universe. We may have ended up being 3D because it was the most probable. In its basic form, string theory describes subatomic particles as bits of vibrating string.
Third Dimension To put this in cartesian terms, the 2D square existed in the X and Y directions. Moving into the 3rd dimension extruded that square in the Z direction. The third dimension is where our cube actually becomes a cube in our traditional defined sense. The object has dimensions of width, length, and height.
An autostereogram is a single-image stereogram (SIS), designed to create the visual illusion of a three-dimensional (3D) scene from a two-dimensional image.
So basically what is happening over here is you are playing with three different formats of images (at least those that appear in your question). They are respectively:
(420, 310, 3)
), three channels
(420, 310, 4)
), four channels
(420, 310)
), single channel
The third dimension that you are seeing is what represents the number of channels in your image (the first two being the height and width respectively).
An example will further clear it up. I downloaded random images from the internet each belonging to one of the three formats mentioned above.
RGB image dog.png
RGB-A image fish.png
Grayscale image lena.png
Here's a python script to load each of them using PIL
and display their shape:
from PIL import Image
import numpy as np
dog = Image.open('dog.png')
print('Dog shape is ' + str(np.array(dog).shape))
fish = Image.open('fish.png')
print('Fish shape is ' + str(np.array(fish).shape))
lena = Image.open('lena.png')
print('Lena shape is ' + str(np.array(lena).shape))
And here is the output:
Dog shape is (250, 250, 3)
Fish shape is (501, 393, 4)
Lena shape is (512, 512)
Hence, when you are trying to iteratively assign all the images to an array (np.array
), you are getting the shape mis-match error.
The easiest way to resolve this is to convert all the images to one particular format before saving it in the array. Assuming you will be using a pre-trained ImageNet model, we will convert them to RGB
format (you can similarly choose a format of your choice also).
We will convert RGB-A
to RGB
using the following code:
fish = Image.open('fish.png')
print('Fish RGB-A shape is ' + str(np.array(fish).shape))
rgb = fish.convert('RGB')
print('Fish RGB shape is ' + str(np.array(rgb).shape))
Output is:
Fish RGB-A shape is (501, 393, 4)
Fish RGB shape is (501, 393, 3)
Similarly you can do for all your images, and then you have a consistent number of channels (three in this case) for all your images.
NOTE: In my example, the spatial dimensions vary for the images also. In your case that is not an issue as all are of consistent dimension (420, 310)
.
Hope this clarifies your doubt.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With