NumPy: How to avoid this loop?

Tags:

Is there a way to avoid this loop so optimize the code?

import numpy as np

cLoss = 0
dist_ = np.array([0,1,0,1,1,0,0,1,1,0]) # just an example, longer in reality
TLabels = np.array([-1,1,1,1,1,-1,-1,1,-1,-1]) # just an example, longer in reality
t = float(dist_.size)
for i in range(len(dist_)):
    labels = TLabels[dist_ == dist_[i]]
    cLoss+= 1 - TLabels[i]*(1. * np.sum(labels)/t)
print cLoss

Note: dist_ and TLabels are both numpy arrays with the same shape (t,1)

225

asked Jun 07 '15 10:06

2 Answers

I am not sure what you exactly want to do, but are you aware of scipy.ndimage.measurements for computing on arrays with labels? It look like you want something like:

Click to copy

cLoss =  len(dist_) - sum(TLabels * scipy.ndimage.measurements.sum(TLabels,dist_,dist_) / len(dist_))

183

answered Oct 11 '22 03:10

Thomas Baruchel

I first wonder, what is labels at each step in the loop?

With dist_ = array([2,1,2]) and TLabels=array([1,2,3])

I get

Click to copy

[-1  1]
[1]
[-1  1]

The different length immediately raise a warning flag - it may be difficult to vectorize this.

With the longer arrays in the edited example

Click to copy

[-1  1 -1 -1 -1]
[ 1  1  1  1 -1]
[-1  1 -1 -1 -1]
[ 1  1  1  1 -1]
[ 1  1  1  1 -1]
[-1  1 -1 -1 -1]
[-1  1 -1 -1 -1]
[ 1  1  1  1 -1]
[ 1  1  1  1 -1]
[-1  1 -1 -1 -1]

The labels vectors are all the same length. Is that normal, or just a coincidence of values?

Drop a couple of elements off of dist_, and labels are:

Click to copy

In [375]: for i in range(len(dist_)):
        labels = TLabels[dist_ == dist_[i]]
        v = (1.*np.sum(labels)/t); v1 = 1-TLabels[i]*v
        print(labels, v, TLabels[i], v1)
        cLoss += v1
   .....:     
(array([-1,  1, -1, -1]), -0.25, -1, 0.75)
(array([1, 1, 1, 1]), 0.5, 1, 0.5)
(array([-1,  1, -1, -1]), -0.25, 1, 1.25)
(array([1, 1, 1, 1]), 0.5, 1, 0.5)
(array([1, 1, 1, 1]), 0.5, 1, 0.5)
(array([-1,  1, -1, -1]), -0.25, -1, 0.75)
(array([-1,  1, -1, -1]), -0.25, -1, 0.75)
(array([1, 1, 1, 1]), 0.5, 1, 0.5)

Again different lengths of labels, but really only a few calculations. There is 1 v value for each different dist_ value.

Without working out all the details, it looks like you are just calculating labels*labels for each distinct dist_ value, and then summing those.

This looks like a groupBy problem. You want to divide the dist_ into groups with a common value, and sum some function of their corresponding TLabels values. Python itertools has a groupBy function, so does pandas. I think both require you to sort dist_.

Try sorting dist_ and see if that adds any clarity to the problem.

answered Oct 11 '22 01:10

hpaulj

Related questions
                            
                                Most efficient way to implement numpy.in1d for muliple arrays
                            
                                How to take n-th order discrete sum of numpy array (sum equivalent of numpy.diff)
                            
                                Saving a variable in a text file
                            
                                knitr - Python engine cache option not working
                            
                                `numpy.mean` used with a tuple as `axis` argument: not working with a masked array
                            
                                How can I check that a AWS S3 bucket exists?
                            
                                Accessing shared smb ubuntu in python scripts
                            
                                Statsmodels - broadcast shapes different?
                            
                                How to take an element after a re.compile?
                            
                                Change QTableWidgetItem Background Color
                            
                                What is data type for Python Keras deep learning package?
                            
                                Reshaping pandas DataFrame from Meshgrid
                            
                                SQLAlchemy quoting of table names - Can't redefine 'quote' or 'quote_schema' arguments
                            
                                CSRF verification fails when trying to login in an already logged in application Django
                            
                                gdata spreadsheet library for python not working anymore?
                            
                                Getting method calls and their arguments from method object
                            
                                Algorithm equalivence from Matlab to Python
                            
                                Multiline comments in Kivy
                            
                                Create and download a CSV file from a Flask view
                            
                                Call Nested Function in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

NumPy: How to avoid this loop?

Tags:

python

optimization

numpy

farhawa

People also ask

2 Answers

Thomas Baruchel

hpaulj

Recent Activity

Donate For Us