I'm new to Python, and I am learning TensorFlow. In a tutorial using the notMNIST dataset, they give example code to transform the labels matrix to a one-of-n encoded array. The goal is to take an array consisting of label integers 0...9, and return a matrix where each integer has been transformed into a one-of-n encoded array like this: <pre class="prettyprint"><code>0 -> [1, 0, 0, 0, 0, 0, 0, 0, 0, 0] 1 -> [0, 1, 0, 0, 0, 0, 0, 0, 0, 0] 2 -> [0, 0, 1, 0, 0, 0, 0, 0, 0, 0] ... </code></pre> The code they give to do this is: <pre class="prettyprint"><code># Map 0 to [1.0, 0.0, 0.0 ...], 1 to [0.0, 1.0, 0.0 ...] labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32) </code></pre> However, I don't understand how this code does that at all. It looks like it just generates an array of integers in the range of 0 to 9, and then compares that with the labels matrix, and converts the result to a float. How does an <code>==</code> operator result in a one-of-n encoded matrix?

In short, == applied to a numpy array means applying element-wise == to the array. The result is an array of booleans. Here is an example: <pre class="prettyprint"><code>>>> b = np.array([1,0,0,1,1,0]) >>> b == 1 array([ True, False, False, True, True, False], dtype=bool) </code></pre> To count say how many 1s there are in <code>b</code>, you don't need to cast the array to float, i.e. the <code>.astype(np.float32)</code> can be saved, because in python boolean is a subclass of int and in Python 3 you have <code>True == 1 False == 0</code>. So here is how you count how many ones is in <code>b</code>: <pre class="prettyprint"><code>>>> np.sum((b == 1)) 3 </code></pre> Or: <pre class="prettyprint"><code>>>> np.count_nonzero(b == 1) 3 </code></pre>

Understanding == applied to a NumPy array

Tags:

python

numpy

python-2.7

I'm new to Python, and I am learning TensorFlow. In a tutorial using the notMNIST dataset, they give example code to transform the labels matrix to a one-of-n encoded array.

The goal is to take an array consisting of label integers 0...9, and return a matrix where each integer has been transformed into a one-of-n encoded array like this:

0 -> [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
1 -> [0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
2 -> [0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
...

The code they give to do this is:

# Map 0 to [1.0, 0.0, 0.0 ...], 1 to [0.0, 1.0, 0.0 ...]
labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)

However, I don't understand how this code does that at all. It looks like it just generates an array of integers in the range of 0 to 9, and then compares that with the labels matrix, and converts the result to a float. How does an == operator result in a one-of-n encoded matrix?

941

asked Apr 10 '16 05:04

Nimrand

2 Answers

There are a few things going on here: numpy's vector ops, adding a singleton axis, and broadcasting.

First, you should be able to see how the == does the magic.

Let's say we start with a simple label array. == behaves in a vectorized fashion, which means that we can compare the entire array with a scalar and get an array consisting of the values of each elementwise comparison. For example:

>>> labels = np.array([1,2,0,0,2])
>>> labels == 0
array([False, False,  True,  True, False], dtype=bool)
>>> (labels == 0).astype(np.float32)
array([ 0.,  0.,  1.,  1.,  0.], dtype=float32)

First we get a boolean array, and then we coerce to floats: False==0 in Python, and True==1. So we wind up with an array which is 0 where labels isn't equal to 0 and 1 where it is.

But there's nothing special about comparing to 0, we could compare to 1 or 2 or 3 instead for similar results:

>>> (labels == 2).astype(np.float32)
array([ 0.,  1.,  0.,  0.,  1.], dtype=float32)

In fact, we could loop over every possible label and generate this array. We could use a listcomp:

>>> np.array([(labels == i).astype(np.float32) for i in np.arange(3)])
array([[ 0.,  0.,  1.,  1.,  0.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  1.]], dtype=float32)

but this doesn't really take advantage of numpy. What we want to do is have each possible label compared with each element, IOW to compare

>>> np.arange(3)
array([0, 1, 2])

with

>>> labels
array([1, 2, 0, 0, 2])

And here's where the magic of numpy broadcasting comes in. Right now, labels is a 1-dimensional object of shape (5,). If we make it a 2-dimensional object of shape (5,1), then the operation will "broadcast" over the last axis and we'll get an output of shape (5,3) with the results of comparing each entry in the range with each element of labels.

First we can add an "extra" axis to labels using None (or np.newaxis), changing its shape:

>>> labels[:,None]
array([[1],
       [2],
       [0],
       [0],
       [2]])
>>> labels[:,None].shape
(5, 1)

And then we can make the comparison (this is the transpose of the arrangement we were looking at earlier, but that doesn't really matter).

>>> np.arange(3) == labels[:,None]
array([[False,  True, False],
       [False, False,  True],
       [ True, False, False],
       [ True, False, False],
       [False, False,  True]], dtype=bool)
>>> (np.arange(3) == labels[:,None]).astype(np.float32)
array([[ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 0.,  0.,  1.]], dtype=float32)

Broadcasting in numpy is very powerful, and well worth reading up on.

136

answered Sep 21 '22 13:09

DSM

In short, == applied to a numpy array means applying element-wise == to the array. The result is an array of booleans. Here is an example:

>>> b = np.array([1,0,0,1,1,0])
>>> b == 1
array([ True, False, False,  True,  True, False], dtype=bool)

To count say how many 1s there are in b, you don't need to cast the array to float, i.e. the .astype(np.float32) can be saved, because in python boolean is a subclass of int and in Python 3 you have True == 1 False == 0. So here is how you count how many ones is in b:

>>> np.sum((b == 1))
3

Or:

>>> np.count_nonzero(b == 1)
3

answered Sep 21 '22 13:09

VirtualArchitect

Related questions
                            
                                SSH Tunnel for Python MySQLdb connection
                            
                                Strange PEP8 recommendation on comparing Boolean values to True or False
                            
                                simple inter-process communication
                            
                                Run BASH built-in commands in Python?
                            
                                Check if file system is case-insensitive in Python
                            
                                Using Python's max to return two equally large values
                            
                                Python: JSON string to list of dictionaries - Getting error when iterating
                            
                                Get IP Address when testing flask application through nosetests
                            
                                How can I get Python to automatically create missing key/value pairs in a dictionary? [duplicate]
                            
                                Python write string of bytes to file
                            
                                What does "if var" mean in python?
                            
                                What is the Difference between PySphere and PyVmomi?
                            
                                Python property returning property object
                            
                                Convert date to float for linear regression on Pandas data frame
                            
                                pg_config executable not found when using pgxnclient on Windows 7 x64
                            
                                How do I catch errors with scrapy so I can do something when I get User Timeout error?
                            
                                Clean way to get the "true" stem of a Path object?
                            
                                Access last index value of dataframe
                            
                                lambda in python can iterate dict?
                            
                                How to traverse a GenericForeignKey in Django?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With