Is there a way to avoid this loop so optimize the code?
import numpy as np
cLoss = 0
dist_ = np.array([0,1,0,1,1,0,0,1,1,0]) # just an example, longer in reality
TLabels = np.array([-1,1,1,1,1,-1,-1,1,-1,-1]) # just an example, longer in reality
t = float(dist_.size)
for i in range(len(dist_)):
labels = TLabels[dist_ == dist_[i]]
cLoss+= 1 - TLabels[i]*(1. * np.sum(labels)/t)
print cLoss
Note: dist_
and TLabels
are both numpy arrays with the same shape (t,1)
Use numpy. set_printoptions() to print an array without scientific notation. Call set_printoptions(suppress=True) to suppress scientific notation when printing.
The map() function is a replacement to a for a loop. It applies a function for each element of an iterable. The map() function accepts two arguments: A function that is applied for each element in the list (a lambda expression)
NumPy Arrays are faster than Python Lists because of the following reasons: An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations.
Again, some have observed vectorize to be faster than normal for loops, but even the NumPy documentation states: “The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.”
I am not sure what you exactly want to do, but are you aware of scipy.ndimage.measurements
for computing on arrays with labels? It look like you want something like:
cLoss = len(dist_) - sum(TLabels * scipy.ndimage.measurements.sum(TLabels,dist_,dist_) / len(dist_))
I first wonder, what is labels
at each step in the loop?
With dist_ = array([2,1,2])
and TLabels=array([1,2,3])
I get
[-1 1]
[1]
[-1 1]
The different length immediately raise a warning flag - it may be difficult to vectorize this.
With the longer arrays in the edited example
[-1 1 -1 -1 -1]
[ 1 1 1 1 -1]
[-1 1 -1 -1 -1]
[ 1 1 1 1 -1]
[ 1 1 1 1 -1]
[-1 1 -1 -1 -1]
[-1 1 -1 -1 -1]
[ 1 1 1 1 -1]
[ 1 1 1 1 -1]
[-1 1 -1 -1 -1]
The labels
vectors are all the same length. Is that normal, or just a coincidence of values?
Drop a couple of elements off of dist_
, and labels
are:
In [375]: for i in range(len(dist_)):
labels = TLabels[dist_ == dist_[i]]
v = (1.*np.sum(labels)/t); v1 = 1-TLabels[i]*v
print(labels, v, TLabels[i], v1)
cLoss += v1
.....:
(array([-1, 1, -1, -1]), -0.25, -1, 0.75)
(array([1, 1, 1, 1]), 0.5, 1, 0.5)
(array([-1, 1, -1, -1]), -0.25, 1, 1.25)
(array([1, 1, 1, 1]), 0.5, 1, 0.5)
(array([1, 1, 1, 1]), 0.5, 1, 0.5)
(array([-1, 1, -1, -1]), -0.25, -1, 0.75)
(array([-1, 1, -1, -1]), -0.25, -1, 0.75)
(array([1, 1, 1, 1]), 0.5, 1, 0.5)
Again different lengths of labels, but really only a few calculations. There is 1 v
value for each different dist_
value.
Without working out all the details, it looks like you are just calculating labels*labels
for each distinct dist_
value, and then summing those.
This looks like a groupBy
problem. You want to divide the dist_
into groups with a common value, and sum some function of their corresponding TLabels
values. Python itertools
has a groupBy
function, so does pandas
. I think both require you to sort dist_
.
Try sorting dist_
and see if that adds any clarity to the problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With