I have a quick (and probably very trivial) question for most of you. I am trying to use a loop that will increment two variables so I can generate a heatmap plot that will reveal the similarity of the files in a simple form.
The idea is if I have 100 files, I would like to compare each of them to one another. Currently I repeat my comparisons (i.e. compare file 1 & 2 and then file 2 & 1) which is very inefficient. The current stripped down script I have is shown below:
for fileX in range(1,4):
for fileY in range(1,4):
print "X is " + str(fileX) + ", Y is " + str(fileY)
The output I obtain is something like this:
X is 1, Y is 1
X is 1, Y is 2
X is 1, Y is 3
X is 2, Y is 1
X is 2, Y is 2
X is 2, Y is 3
X is 3, Y is 1
X is 3, Y is 2
X is 3, Y is 3
Whereas what I am looking for is something like this:
X is 1, Y is 1 << not necessary since it is always 100 %
X is 1, Y is 2
X is 1, Y is 3
X is 2, Y is 2 << not necessary since it is always 100 %
X is 2, Y is 3
X is 3, Y is 3 << not necessary since it is always 100 %
The reason being, I have already compared files 1 & 2, 1 & 3 and 2 & 3 in the previous iteration. Obviously for a short list of a couple files this is not overly bad, however for hundred files it increases the computation significantly. This will enable me to speed up the comparison quite significantly, especially since the files that I am comparing are usually pretty large (~500K lines each).
I would appreciate any suggestions.
You can use the value of the first loop as the starting value of the range of the second loop like
for fileX in range(1,4):
for fileY in range(fileX,4):
To also skip the equall ones do
for fileX in range(1,4):
for fileY in range(fileX+1,4):
Don't reinvent the wheel. If you need combinations, just use itertools.combinations
:
for fileX, fileY in itertools.combinations(range(1,4), 2):
print "X is " + str(fileX) + ", Y is " + str(fileY)
Output:
X is 1, Y is 2
X is 1, Y is 3
X is 2, Y is 3
Compared to the double-for-loop, this is somewhat more readable (the code tells you exactly what it does) and less prone of introducing silly off-by-one errors and the like. Also, this works equally well with any sort of collection or iterable, not just with an ordered list of numbers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With