Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - something faster than 2 nested for loops

def fancymatching(fname1, fname2):
#This function will do much smarter and fancy kinds of compares
    if (fname1 == fname2):
        return 1
    else:
        return 0

personlist = [
{ 
'pid':'1',
'fname':'john',
'mname':'a',
'lname':'smyth',
},{ 
'pid':'2',
'fname':'john',
'mnane':'a',
'lname':'smith',
},{ 
'pid':'3',
'fname':'bob',
'mname':'b',
'lname':'nope',
}
]

for person1 in personlist:
    for person2 in personlist:
        if person1['pid'] >= person2['pid']:
            #don't check yourself, or ones that have been
        continue
        if fancymatching(person1['fname'], person2['fname']):
            print (person1['pid'] + " matched " + person2['pid'])

I'm trying to improve on the idea of the above code. It works, but if personlist becomes very large (say millions) I feel there must be something faster than 2 for loops.

What the code is doing is taking a list of dictionaries and running a fancy fuzzy matching function on the values of each dictionary against each other dictionary. So it's not as simple as just comparing all the dictionaries to the other ones. I'd like a way to run a function on each dictionary, maybe 2 for loops is the right way to do this? Any suggestions would be helpful!

like image 969
sniperd Avatar asked Feb 16 '17 20:02

sniperd


People also ask

Can you have 3 nested loops?

There is no limitation on the chaining of loops. In the nested loop, the number of iterations will be equal to the number of iterations in the outer loop multiplied by the iterations in the inner loop. In each iteration of the outer loop inner loop execute all its iteration.

What is faster than for loop in Python?

A faster way to loop in Python is using built-in functions. In our example, we could replace the for loop with the sum function. This function will sum the values inside the range of numbers.

Are nested for loops faster?

HOWEVER, the conclusion is CONSISTENT: The nested loop is much FASTER. When the iteration time is 100^5, the difference is significant: 321.49 vs 210.05. There is about 1.53-time gap between them.

Are nested for loops slow in Python?

Effectively, yes, it's slow because it's a nested for loop, because of the size. Python's element in list operation works by just searching the entire list, element by element, for the one it wants.


1 Answers

You can use itertools.combinations which is essentially the same double loop but it iterates faster because it's written in C (that only reduces the constant factor, you still have the O(n**2) runtime behaviour) and you don't need the if person1['pid'] >= person2['pid']: continue anymore (that's built into the combinations function already).

from itertools import combinations

for person1, person2 in combinations(personlist, 2):
    print(person1['fname'], person2['fname'])

which prints:

('john', 'john')
('john', 'bob')
('john', 'bob')

However if your fancymatching allows it then you could also group (O(n) runtime) your values. For example in your case you only match identical 'fname'-values.

>>> matches = {}
>>> for person in personlist:
...     matches.setdefault(person['fname'], []).append(person)
>>> matches
{'bob': [{'fname': 'bob', 'lname': 'nope', 'mname': 'b', 'pid': '3'}],
 'john': [{'fname': 'john', 'lname': 'smyth', 'mname': 'a', 'pid': '1'}, 
          {'fname': 'john', 'lname': 'smith', 'mnane': 'a', 'pid': '2'}]}

But that's only possible if your fancymatching allows such a grouping. Which is True for your case but if it's more complicated it might not be.

like image 138
MSeifert Avatar answered Oct 13 '22 13:10

MSeifert