Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently compare large lists in Python?

I am trying to find 9 letter words, that when you split evenly into 3 parts, and jumble around, you get another nine letter word.

for i in nineWordList:
    for j in nineWordList:
        if (i[3:5] + i[0:2] + i[6:8]) == j:
            correctWords.append(i)
        elif (i[3:5] + i[6:8] + i[0:2]) == j:
            correctWords.append(i)
        elif (i[0:2] + i[6:8] + i[3:5]) == j:
            correctWords.append(i)
        elif (i[6:8] + i[0:2] + i[3:5]) == j:
            correctWords.append(i)
        elif (i[6:8] + i[3:5] + i[0:2]) == j:
            correctWords.append(i)

This is how I do it. The only problem is nineWordList is 68,000 elements long, and this takes ages. How can I improve this, to make it more efficient?

like image 810
Melkor Avatar asked Jun 24 '26 14:06

Melkor


1 Answers

Use a set to avoid having to loop on two levels through the list:

nineWordSet = set(nineWordList)
for i in nineWordSet:
    if i[3:5] + i[0:2] + i[6:8] in nineWordSet:
        correctWords.append(i)
    elif i[3:5] + i[6:8] + i[0:2] in nineWordSet:
        correctWords.append(i)
    elif i[0:2] + i[6:8] + i[3:5] in nineWordSet:
        correctWords.append(i)
    elif i[6:8] + i[0:2] + i[3:5] in nineWordSet:
        correctWords.append(i)
    elif i[6:8] + i[3:5] + i[0:2] in nineWordSet:
        correctWords.append(i)

This will still have to loop through all those 68,000 entries (you obviously cannot avoid that) but in a first pass, it will convert the list into a set, so membership tests using in can be made in constant time. This gives you a linear time complexity instead of the quadratic time complexity that your nested loops had. Of course, the additional set will require more memory but that shouldn’t be a problem.


Btw. I believe your slicing is off. i[0:2] will not produce a 3-letter word (when you want to split a 9-letter word evenly):

>>> x = 'abcdefghi'
>>> x[0:2], x[3:5], x[6:8]
('ab', 'de', 'gh')

The second index in slices is always non-inclusive so you need to increase that by one:

>>> x[0:3], x[3:6], x[6:9]
('abc', 'def', 'ghi')

You can also shorten your conditions a bit by using itertools.permutations to generate those possible jumpled words. That way, your checks might be a bit nicer to the eye:

import itertools
nineWordSet = set(nineWordList)

for word in nineWordSet:
    for perm in itertools.permutations((word[0:3], word[3:6], word[6:9])):
        # skip the original permutation
        if perm == word:
            continue

        elif perm in nineWordSet:
            correctWords.append(word)

            # stop checking for more permutations
            break
like image 99
poke Avatar answered Jun 26 '26 04:06

poke



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!