Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can generating permutations be done in parallel?

I'm trying to figure out if I can speed up the generation of permutations. Specifically I'm using 8 out of [a-z] and I'd like to use 8 out of [a-zA-Z] and 8 out of [a-zA-Z0-9]. Which I know will quickly take a great deal of time and space.

Even just a permutation of length 8 from the lowercase ASCII characters takes a while and generates gigabytes. My problem is that I don't understand the underlying algorithm and so I can't begin to figure out if I can split the problem up into smaller tasks than I can join together later.

A python script I was using to generate a list of permutations:

import string
import itertools
from itertools import permutations

comb = itertools.permutations(string.ascii_lowercase, 8)

f = open('8letters.txt', 'w')
for x in comb:
        y = ''.join(x)
        f.write(y + '\n')

f.close()

Does anyone know how to partition this up into subtasks and put them together later? Is it possible at all?

I might just try a (possibly) faster way of doing it, but I'm having trouble with C++ and its std::next_permutation(), so I can't verify that it could speed things up even a little just yet.

If I can partition it up into 16 tasks, and run on 16 Xeon CPUs, then join the results, that would be great.

like image 541
James Young Avatar asked Nov 11 '18 15:11

James Young


1 Answers

If it were just permutations with replacement, it would be very easy: Just parallelize on the first letter of the string and then let each thread add the tail of the string. This would give you 26 independent tasks. If that is not enough, you could parallelize on the first two letters.

You want to have a permutation without replacement, so the problem does not trivially factorize. If one only wants to pick 8 letters from a set of 26, 52 and 62, one could do a naive brute thing: Parallelize on the first letter, let the thread just create the tail with replacement and discard the generated strings that contain duplicates. But when you want to pick 25 from 26 letters, this becomes really wasteful.

With that idea in mind, we can do better! We parallelize on the first letter of the string and then generate all permutations with seven elements from the set excluding the letter that we started with. This way we can have 26 tasks (or 52 or 62) and still use the algorithm. This could look like this:

# Use whatever chars you want as the set.
chars = set(string.ascii_lowercase)

# We iterate through all the possible heads. This exact loop will be
# parallelized in the end.
for head in chars:
    # Exclude the head from the set.
    remaining_chars = chars - set(head)

    for tail in itertools.permutations(remaining_chars, 7):
        string = head + ''.join(tail)

        # Store the string in your list/file.

In order to make use of multiple cores, we use a pool. For this we first need a function to map over. This is just the above a bit refactored:

def make_strings(head):
    remaining_chars = chars - set(head)
    strings = [
        head + ''.join(tail)
        for tail in itertools.permutations(remaining_chars, 7)]

    return strings

Now somewhere else we can create a pool and let it map over heads:

with multiprocessing.Pool() as pool:
    all_strings = pool.map(make_strings, chars)

The pool only got the needed __enter__ and __exit__ properties with Python 3, so I assume that we use that.

Once that is finished the flatten the list of lists into a plain list of strings:

strings = [
    string
    for sub_list in all_strings
    for string in sub_list]

Since 26 is an odd number for 16 cores, we could think about creating the heads with itertools.permutation(remaining_chars, 2) and then using the set subtraction to generate the last 6 digits.


This is a complete working script for Python 3 that summarizes everything:

import itertools
import multiprocessing
import string


chars = set(string.ascii_lowercase)


def make_strings(head):
    remaining_chars = chars - set(head)
    strings = [
        head + ''.join(tail)
        for tail in itertools.permutations(remaining_chars, 3)]

    return strings


def main():
    with multiprocessing.Pool() as pool:
        all_strings = pool.map(make_strings, chars)

    strings = [
        string
        for sub_list in all_strings
        for string in sub_list]

    print(strings[:100])


if __name__ == '__main__':
    main()
like image 89
Martin Ueding Avatar answered Nov 15 '22 13:11

Martin Ueding