Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iteratively generating a permutation of natural numbers

I have a somewhat unusual question, that may or may not have been asked before (I did not find anything though, however I might just have looked for the wrong buzzwords).

My task is pretty simple: Given the "list" of natural numbers up to an N [0, 1, 2, ... N - 1] I want to shuffle this sequence. E.g. when I input the number 4, one possible result would be [3, 0, 1, 2]. The randomness should be determinable by some seed (which however is standard to most PRNGs in common languages).

The naive approach would be to just instantiate an array of size N, fill it with the numbers and use any shuffling algorithm.

The problem however is, that the memory complexity of this approach is O(n), which in my special case is not tractable. My idea is, to write a generator that iteratively provides the numbers in the resulting list.

To be more precise I want some "algorithm" that provides the numbers in an iterative fashion. To be more precise, a conceptual class would look like this:

class Generator {
   // some state
   int nextNumber(...) {
      // some magic
   }
}

And calling the nextNumber method iteratively provides the numbers of the sequence (i.e. any permutation of [0, 1, ... N - 1]. Of course that state of this generator instance should have a better memory complexity than just O(n) again (I would gain nothing).

Is there any algorithm to do, what I desire?

like image 552
PfannkuchenXD Avatar asked Jul 18 '18 23:07

PfannkuchenXD


2 Answers

Here's a fairly simple implementation in Python 3 of Format-preserving encryption using a balanced Feistel network that I wrote almost 2 years ago. It can perform the kind of index permutation that you want for N up to 264 on a 32 bit system, or 2128 on a 64 bit build of Python. That's due to the size of the integer returned by the hash() function. See sys.hash_info to find the limits for your system. It wouldn't be hard to use a fancier hash function that can return values with greater bit length, but I didn't want to make this code more complicated or slower.

Update

I've made a few minor improvements to the previous version, and I've added some more information in the comments. Instead of using the low bits returned from the hash function, we use the high bits, which generally improves the randomness, especially for short bit lengths. I've also added another hash function, xxhash by Yann Collet, which works works much better than Python's hash for this application, especially for shorter bit lengths, although it is a little slower. The xxhash algorithm has a much higher avalanche effect than the built-in hash, so the resulting permutations tend to be more well-shuffled.

Although this code works for small values of stop it's more suited to handle stop >= 2**16. If you need to permute smaller ranges it's probably a Good Idea to just use random.shuffle on list(range(stop)). It will be faster, and it doesn't use that much RAM: list(range(2**16)) consumes about 1280 kilobytes on a 32 bit machine.

You will notice that I use a string to seed the random number generator. For this application we want the randomizer to have plenty of entropy, and using a big string (or bytes) is an easy way to do that, as the random module docs mention. Even so, this program can only produce a tiny fraction of all the possible permutations when stop is large. For stop == 35 there are 35! (35 factorial) different permutations, and 35! > 2132, but the total bit length of our keys is only 128, so they can't cover all those permutations. We could increase the number of Feistel rounds to get a little more coverage, but obviously that would be impractical for large values of stop.

''' Format preserving encryption using a Feistel network

    This code is *not* suitable for cryptographic use.

    See https://en.wikipedia.org/wiki/Format-preserving_encryption
    https://en.wikipedia.org/wiki/Feistel_cipher
    http://security.stackexchange.com/questions/211/how-to-securely-hash-passwords

    A Feistel network performs an invertible transformation on its input,
    so each input number produces a unique output number. The netword operates
    on numbers of a fixed bit width, which must be even, i.e., the numbers
    a particular network operates on are in the range(4**k), and it outputs a
    permutation of that range.

    To permute a range of general size we use cycle walking. We set the
    network size to the next higher power of 4, and when we produce a number
    higher than the desired range we simply feed it back into the network,
    looping until we get a number that is in range.

    The worst case is when stop is of the form 4**k + 1, where we need 4
    steps on average to reach a valid n. In the typical case, where stop is
    roughly halfway between 2 powers of 4, we need 2 steps on average.

    Written by PM 2Ring 2016.08.22
'''

from random import Random

# xxhash by Yann Collet. Specialised for a 32 bit number
# See http://fastcompression.blogspot.com/2012/04/selecting-checksum-algorithm.html

def xxhash_num(n, seed):
    n = (374761397 + seed + n * 3266489917) & 0xffffffff
    n = ((n << 17 | n >> 15) * 668265263) & 0xffffffff
    n ^= n >> 15
    n = (n * 2246822519) & 0xffffffff
    n ^= n >> 13
    n = (n * 3266489917) & 0xffffffff
    return n ^ (n >> 16)

class FormatPreserving:
    """ Invertible permutation of integers in range(stop), 0 < stop <= 2**64
        using a simple Feistel network. NOT suitable for cryptographic purposes.
    """
    def __init__(self, stop, keystring):
        if not 0 < stop <= 1 << 64:
            raise ValueError('stop must be <=', 1 << 64)

        # The highest number in the range
        self.maxn = stop - 1

        # Get the number of bits in each part by rounding
        # the bit length up to the nearest even number
        self.shiftbits = -(-self.maxn.bit_length() // 2)
        self.lowmask = (1 << self.shiftbits) - 1
        self.lowshift = 32 - self.shiftbits

        # Make 4 32 bit round keys from the keystring.
        # Create an independent random stream so we
        # don't intefere with the default stream.
        stream = Random()
        stream.seed(keystring)
        self.keys = [stream.getrandbits(32) for _ in range(4)]
        self.ikeys = self.keys[::-1]

    def feistel(self, n, keys):
        # Split the bits of n into 2 parts & perform the Feistel
        # transformation on them.
        left, right = n >> self.shiftbits, n & self.lowmask
        for key in keys:
            left, right = right, left ^ (xxhash_num(right, key) >> self.lowshift)
            #left, right = right, left ^ (hash((right, key)) >> self.lowshift) 
        return (right << self.shiftbits) | left

    def fpe(self, n, reverse=False):
        keys = self.ikeys if reverse else self.keys
        while True:
            # Cycle walk, if necessary, to ensure n is in range.
            n = self.feistel(n, keys)
            if n <= self.maxn:
                return n

def test():
    print('Shuffling a small number')
    maxn = 10
    fpe = FormatPreserving(maxn, 'secret key string')
    for i in range(maxn):
        a = fpe.fpe(i)
        b = fpe.fpe(a, reverse=True)
        print(i, a, b)

    print('\nShuffling a small number, with a slightly different keystring')
    fpe = FormatPreserving(maxn, 'secret key string.')
    for i in range(maxn):
        a = fpe.fpe(i)
        b = fpe.fpe(a, reverse=True)
        print(i, a, b)

    print('\nHere are a few values for a large maxn')
    maxn = 10000000000000000000
    print('maxn =', maxn)
    fpe = FormatPreserving(maxn, 'secret key string')
    for i in range(10):
        a = fpe.fpe(i)
        b = fpe.fpe(a, reverse=True)
        print('{}: {:19} {}'.format(i, a, b))

    print('\nUsing a set to test that there are no collisions...')
    maxn = 100000
    print('maxn', maxn)
    fpe = FormatPreserving(maxn, 'secret key string')
    a = {fpe.fpe(i) for i in range(maxn)}
    print(len(a) == maxn)

    print('\nTesting that the operation is bijective...')
    for i in range(maxn):
        a = fpe.fpe(i)
        b = fpe.fpe(a, reverse=True)
        assert b == i, (i, a, b)
    print('yes')

if __name__ == "__main__":
    test()

output

Shuffling a small number
0 4 0
1 2 1
2 5 2
3 9 3
4 1 4
5 3 5
6 7 6
7 0 7
8 6 8
9 8 9

Shuffling a small number, with a slightly different keystring
0 9 0
1 8 1
2 3 2
3 5 3
4 2 4
5 6 5
6 1 6
7 4 7
8 7 8
9 0 9

Here are a few values for a large maxn
maxn = 10000000000000000000
0: 7071024217413923554 0
1: 5613634032642823321 1
2: 8934202816202119857 2
3:  296042520195445535 3
4: 5965959309128333970 4
5: 8417353297972226870 5
6: 7501923606289578535 6
7: 1722818114853762596 7
8:  890028846269590060 8
9: 8787953496283620029 9

Using a set to test that there are no collisions...
maxn 100000
True

Testing that the operation is bijective...
yes
0 4
1 2
2 5
3 9
4 1
5 3
6 7
7 0
8 6
9 8

Here's how to use it to make a generator:

def ipermute(stop, keystring):
    fpe = FormatPreserving(stop, keystring)
    for i in range(stop):
        yield fpe.fpe(i)

for i, v in enumerate(ipermute(10, 'secret key string')):
    print(i, v)

output

0 4
1 2
2 5
3 9
4 1
5 3
6 7
7 0
8 6
9 8

It's reasonably fast (for Python), but it's definitely not suitable for cryptography. It could be made crypto-grade by increasing the number of Feistel rounds to at least 5 and by using a suitable cryptographic hash function, eg Blake2. Also, a cryptographic method would need to be used to produce the Feistel keys. Of course, one shouldn't write cryptographic software, unless you know exactly what you're doing, since it's just too easy to write code that's vulnerable to timing attacks, etc.

like image 77
PM 2Ring Avatar answered Nov 14 '22 02:11

PM 2Ring


What you are looking for is a pseudo-random permutation in the form of a function, say f, that maps numbers from 1 to N onto numbers from 1 to N in a pseudo-random bijective way. Then, to generate the nth number in a pseudo-random permutation, you just return f(n)

This is essentially the same problem as encryption. A block cipher with a key is a pseudo-random bijective function. If you feed it all possible plaintext blocks exactly once in some order, it will return all possible ciphertext blocks exactly once in a different, pseudo-random order.

So to solve a problem like yours, what you essentially do is create a cipher that works on numbers from 1 to N instead of 256-bit blocks or whatever. You can use the tools from cryptography to do this.

For example, you can construct your permutation function with a Feistel structure (https://en.wikipedia.org/wiki/Feistel_cipher) like this:

  1. Let W be floor(sqrt(N)), and let the input to the function be x
  2. If x < W^2, then divide x into 2 fields from 0 to W-1: h = floor(x/W) and l = x%W. Hash h to produce a value from 0 to W-1, and set l = (l+hash)%W. Then swap fields -- let x = l*W + h
  3. x = (x+(N-W^2))%N
  4. Repeat steps (2) and (3) some number of times. The more you do it, the more random the result looks. Step (3) ensures that x < W^2 will be true for many of the rounds.

Since this function consists of a number of steps, each of which will map the numbers from 0 to N-1 onto the numbers from 0 to N-1 in a bijective way, the entire function will also have this property. If you feed it the numbers from 0 to N-1, you'll get them back out in a pseudo-random order.

like image 22
Matt Timmermans Avatar answered Nov 14 '22 02:11

Matt Timmermans