I was playing around with NumPy and Pillow and came across an interesting result that apparently showcases a pattern in NumPy random.random()
results.
Here a sample of the full code for generating and saving 100 of these images (with seed 0), the above are the first four images generated by this code.
import numpy as np
from PIL import Image
np.random.seed(0)
img_arrays = np.random.random((100, 256, 256, 3)) * 255
for i, img_array in enumerate(img_arrays):
img = Image.fromarray(img_array, "RGB")
img.save("{}.png".format(i))
The above are four different images created using PIL.Image.fromarray()
on four different NumPy arrays created using numpy.random.random((256, 256, 3)) * 255
to generate a 256 by 256 grid of RGB values in four different Python instances (the same thing also happens in the same instance).
I noticed that this only happens (in my limited testing) when the width and height of the image is a power of two, I am not sure how to interpret that.
Although it may be hard to see due to browser anti-aliasing (you can download the images and view them in image viewers with no anti-aliasing), there are clear purple-brown columns of pixels every 8th column starting from the 3rd column of every image. To make sure, I tested this on 100 different images and they all followed this pattern.
What is going on here? I am guessing that patterns like this are the reason that people always say to use cryptographically secure random number generators when true randomness is required, but is there a concrete explanation behind why this is happening in particular?
The “random” numbers generated by NumPy are not exactly random. They are pseudo-random … they approximate random numbers, but are 100% determined by the input and the pseudo-random number algorithm.
Python Random random() Method The random() method returns a random floating number between 0 and 1.
random() function generates random floating numbers in the range[0.1, 1.0). (See the opening and closing brackets, it means including 0 but excluding 1). It takes no parameters and returns values uniformly distributed between 0 and 1.
Most random data generated with Python is not fully random in the scientific sense of the word. Rather, it is pseudorandom: generated with a pseudorandom number generator (PRNG), which is essentially any algorithm for generating seemingly random but still reproducible data.
Don't blame Numpy, blame PIL / Pillow. ;) You're generating floats, but PIL expects integers, and its float to int conversion is not doing what we want. Further research is required to determine exactly what PIL is doing...
In the mean time, you can get rid of those lines by explicitly converting your values to unsigned 8 bit integers:
img_arrays = (np.random.random((100, 256, 256, 3)) * 255).astype(np.uint8)
As FHTMitchell notes in the comments, a more efficient form is
img_arrays = np.random.randint(0, 256, (100, 256, 256, 3), dtype=np.uint8)
Here's typical output from that modified code:
The PIL Image.fromarray function has a known bug, as described here. The behaviour you're seeing is probably related to that bug, but I guess it could be an independent one. ;)
FWIW, here are some tests and workarounds I did on the bug mentioned on the linked question.
I'm pretty sure the problem is to do with the dtype, but not for the reasons you think. Here is one with np.random.randint(0, 256, (1, 256, 256, 3), dtype=np.uint32)
note the dtype is not np.uint8
:
Can you see the pattern ;)? PIL interprets 32 bit (4 byte) values (probably as 4 pixels RGBK) differently from 8 bit values (RGB for one pixel). (See PM 2Ring's answer).
Originally you were passing 64 bit float values, these are going to also are interpreted differently (and probably incorrectly from how you intended).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With