I have a matrix <code>listScore</code> with the shape (100000,2): I would like to count all the identical rows like. For instance, if <code>listScore</code> was a list of list I would simple do: <pre class="prettyprint"><code>listScore.count([2,0]) </code></pre> to look for all the list equal to [2,0]. I could obviously transform the type of my <code>listScore</code> so that it would be a list but I want to keep to effectiveness of <code>numpy</code>. Is there any function I could use to do the same thing ? Thanks in advance

If <code>listScore</code> is a NumPy array, you could do - <pre class="prettyprint"><code>count = np.all(listScore == np.array([2,0]),axis=1).sum() </code></pre> If the array is always a 2 columns array, then you can compare the two columns separately with <code>2</code> and <code>0</code> respectively for performance and get the count like so - <pre class="prettyprint"><code>count = ((listScore[:,0] ==2) & (listScore[:,1] ==0)).sum() </code></pre> If you are a fan of <code>np.einsum</code>, you might like to try this twisted one - <pre class="prettyprint"><code>count = (~np.einsum('ij->i',listScore != [2,0])).sum() </code></pre> Another performance-oriented solution could be with <code>cdist from scipy</code> - <pre class="prettyprint"><code>from scipy.spatial.distance import cdist count = (cdist(listScore,np.atleast_2d([2,0]))==0).sum() </code></pre>

Equivalent of count list function in numpy array

Tags:

python

arrays

numpy

I have a matrix listScore with the shape (100000,2): I would like to count all the identical rows like. For instance, if listScore was a list of list I would simple do:

listScore.count([2,0])

to look for all the list equal to [2,0]. I could obviously transform the type of my listScore so that it would be a list but I want to keep to effectiveness of numpy. Is there any function I could use to do the same thing ?

Thanks in advance

214

asked Sep 28 '22 06:09

Dirty_Fox

1 Answers

If listScore is a NumPy array, you could do -

count = np.all(listScore == np.array([2,0]),axis=1).sum()

If the array is always a 2 columns array, then you can compare the two columns separately with 2 and 0 respectively for performance and get the count like so -

count = ((listScore[:,0] ==2) & (listScore[:,1] ==0)).sum()

If you are a fan of np.einsum, you might like to try this twisted one -

count = (~np.einsum('ij->i',listScore != [2,0])).sum()

Another performance-oriented solution could be with cdist from scipy -

from scipy.spatial.distance import cdist
count = (cdist(listScore,np.atleast_2d([2,0]))==0).sum()

190

answered Oct 06 '22 01:10

Divakar

Related questions
                            
                                Writing to multiple adjacent columns in pandas efficiently
                            
                                Is python's hash() portable?
                            
                                Efficient calculation on a pandas dataframe
                            
                                How to pass data from python to javascript in web2py
                            
                                Best way to get a map of a city using Basemap?
                            
                                Does scikit learn's fit_transform also transform my original dataframe?
                            
                                TypeError: boxplot() got an unexpected keyword argument 'labels'
                            
                                Python Flask how to use Response to serve from a generator from a mongo query
                            
                                Creating a custom Spark RDD in Python
                            
                                SQLAlchemy occasionally erroneously returns an empty result
                            
                                numpy.ndarray objects not garbage collected
                            
                                Python regex findall alternation behavior
                            
                                pytest setup_class() after fixture initialization
                            
                                Adding a new line character to a variable in python [duplicate]
                            
                                MNLogit in statsmodel returning nan
                            
                                How To Install PyBluez On Windows 8.1?
                            
                                Average over parts in list of lists
                            
                                subprocess.wait() not waiting for Popen process to finish (when using threads)?
                            
                                How can I get the final redirect URL when using urllib2.urlopen?
                            
                                How to discover current role in Python Fabric

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With