Given two vectors, I would like to create an indicator matrix. For example, given <code>a=np.array([5,5,3,4,4,4])</code>, and <code>b=np.array([5,4,3])</code>, the result should be <pre class="prettyprint"><code> 5 4 3 5 1 0 0 5 1 0 0 3 0 0 1 4 0 1 0 4 0 1 0 4 0 1 0 </code></pre> What is the simplest way to achieve this?

Using <code>NumPy broadcasting</code> - <pre class="prettyprint"><code>(a[:,None]==b).astype(int) </code></pre> Sample run - <pre class="prettyprint"><code>In [104]: a Out[104]: array([5, 5, 3, 4, 4, 4]) In [105]: b Out[105]: array([5, 4, 3]) In [106]: (a[:,None]==b).astype(int) Out[106]: array([[1, 0, 0], [1, 0, 0], [0, 0, 1], [0, 1, 0], [0, 1, 0], [0, 1, 0]]) </code></pre> If by simplest, you meant compact, here's a modified one to do the type conversion - <pre class="prettyprint"><code>In [107]: (a[:,None]==b)*1 Out[107]: array([[1, 0, 0], [1, 0, 0], [0, 0, 1], [0, 1, 0], [0, 1, 0], [0, 1, 0]]) </code></pre> Explanation : <code>None</code> is an alias for <code>numpy.newaxis</code>, which is used to add a new axis (axis with <code>length=1</code>). So, in this case, with <code>a[:,None]</code> we get a <code>2D</code> version of <code>a</code>. There are various other ways to have this <code>2D</code> version, <code>a.reshape(-1,1)</code> being one of those. This allows for <code>broadcasting</code> when compared against <code>1D</code> <code>b</code>, resulting in a 2D array of matches, a boolean array. The final step is conversion to an <code>int</code> array. Step-by-step run - <pre class="prettyprint"><code>In [141]: a Out[141]: array([5, 5, 3, 4, 4, 4]) In [142]: b Out[142]: array([5, 4, 3]) In [143]: a[:,None] Out[143]: array([[5], [5], [3], [4], [4], [4]]) In [144]: a[:,None] == b Out[144]: array([[ True, False, False], [ True, False, False], [False, False, True], [False, True, False], [False, True, False], [False, True, False]], dtype=bool) In [145]: (a[:,None] == b).astype(int) Out[145]: array([[1, 0, 0], [1, 0, 0], [0, 0, 1], [0, 1, 0], [0, 1, 0], [0, 1, 0]]) </code></pre>

Create indicator matrix from two arrays in Python Numpy

Tags:

python

numpy

Given two vectors, I would like to create an indicator matrix. For example, given a=np.array([5,5,3,4,4,4]), and b=np.array([5,4,3]), the result should be

What is the simplest way to achieve this?

869

asked Jul 12 '17 17:07

David

1 Answers

Using NumPy broadcasting -

(a[:,None]==b).astype(int)

Sample run -

In [104]: a
Out[104]: array([5, 5, 3, 4, 4, 4])

In [105]: b
Out[105]: array([5, 4, 3])

In [106]: (a[:,None]==b).astype(int)
Out[106]: 
array([[1, 0, 0],
       [1, 0, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0]])

If by simplest, you meant compact, here's a modified one to do the type conversion -

In [107]: (a[:,None]==b)*1
Out[107]: 
array([[1, 0, 0],
       [1, 0, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0]])

Explanation : None is an alias for numpy.newaxis, which is used to add a new axis (axis with length=1). So, in this case, with a[:,None] we get a 2D version of a. There are various other ways to have this 2D version, a.reshape(-1,1) being one of those. This allows for broadcasting when compared against 1D b, resulting in a 2D array of matches, a boolean array. The final step is conversion to an int array.

Step-by-step run -

In [141]: a
Out[141]: array([5, 5, 3, 4, 4, 4])

In [142]: b
Out[142]: array([5, 4, 3])

In [143]: a[:,None]
Out[143]: 
array([[5],
       [5],
       [3],
       [4],
       [4],
       [4]])

In [144]: a[:,None] == b
Out[144]: 
array([[ True, False, False],
       [ True, False, False],
       [False, False,  True],
       [False,  True, False],
       [False,  True, False],
       [False,  True, False]], dtype=bool)

In [145]: (a[:,None] == b).astype(int)
Out[145]: 
array([[1, 0, 0],
       [1, 0, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0]])

answered Oct 02 '22 08:10

Divakar

Related questions
                            
                                Using url_for in tests
                            
                                Find string within JSON with Python
                            
                                Pandas use and operator in LOC function
                            
                                How should we pad text sequence in keras using pad_sequences?
                            
                                How to detect current keyboard language in python
                            
                                How can I see the formulas of an excel spreadsheet in pandas / python?
                            
                                Why we need python packaging (e.g. egg)? [duplicate]
                            
                                How can I create a DataFrame slice object piece by piece?
                            
                                Pandas GroupBy: apply a function with two arguments
                            
                                Deriving an ECDSA uncompressed public key from a compressed one
                            
                                python unicode rendering: how to know if a unicode character is missing from the font
                            
                                How can I update a .yml file, ignoring preexisting Jinja syntax, using Python?
                            
                                How to make nested for loop more Pythonic
                            
                                German Stemming for Sentiment Analysis in Python NLTK
                            
                                Image Orientation (python+openCV)
                            
                                Is it possible to access a private s3 bucket objects without using a pre-signed URL? (boto3, python)
                            
                                Error"Can only compare identically-labeled Series objects" and sort_index
                            
                                Preserve NaN values in pandas boolean comparisons
                            
                                How are Inf and NaN implemented?
                            
                                How to run a BigQuery query in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With