Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Double increment loop in python

I have a quick (and probably very trivial) question for most of you. I am trying to use a loop that will increment two variables so I can generate a heatmap plot that will reveal the similarity of the files in a simple form.

The idea is if I have 100 files, I would like to compare each of them to one another. Currently I repeat my comparisons (i.e. compare file 1 & 2 and then file 2 & 1) which is very inefficient. The current stripped down script I have is shown below:

 for fileX in range(1,4):
    for fileY in range(1,4):
        print "X is " + str(fileX) + ", Y is " + str(fileY)

The output I obtain is something like this:

X is 1, Y is 1
X is 1, Y is 2
X is 1, Y is 3
X is 2, Y is 1
X is 2, Y is 2
X is 2, Y is 3
X is 3, Y is 1
X is 3, Y is 2
X is 3, Y is 3

Whereas what I am looking for is something like this:

X is 1, Y is 1 << not necessary since it is always 100 %
X is 1, Y is 2
X is 1, Y is 3
X is 2, Y is 2 << not necessary since it is always 100 %
X is 2, Y is 3
X is 3, Y is 3 << not necessary since it is always 100 %

The reason being, I have already compared files 1 & 2, 1 & 3 and 2 & 3 in the previous iteration. Obviously for a short list of a couple files this is not overly bad, however for hundred files it increases the computation significantly. This will enable me to speed up the comparison quite significantly, especially since the files that I am comparing are usually pretty large (~500K lines each).

I would appreciate any suggestions.

like image 800
munieq11 Avatar asked Feb 08 '23 20:02

munieq11


2 Answers

You can use the value of the first loop as the starting value of the range of the second loop like

for fileX in range(1,4):
    for fileY in range(fileX,4):

To also skip the equall ones do

for fileX in range(1,4):
    for fileY in range(fileX+1,4):
like image 185
Bas van Stein Avatar answered Feb 15 '23 11:02

Bas van Stein


Don't reinvent the wheel. If you need combinations, just use itertools.combinations:

for fileX, fileY in itertools.combinations(range(1,4), 2):
    print "X is " + str(fileX) + ", Y is " + str(fileY)

Output:

X is 1, Y is 2
X is 1, Y is 3
X is 2, Y is 3

Compared to the double-for-loop, this is somewhat more readable (the code tells you exactly what it does) and less prone of introducing silly off-by-one errors and the like. Also, this works equally well with any sort of collection or iterable, not just with an ordered list of numbers.

like image 37
tobias_k Avatar answered Feb 15 '23 09:02

tobias_k