I have a somewhat unusual question, that may or may not have been asked before (I did not find anything though, however I might just have looked for the wrong buzzwords).
My task is pretty simple: Given the "list" of natural numbers up to an N [0, 1, 2, ... N - 1] I want to shuffle this sequence. E.g. when I input the number 4, one possible result would be [3, 0, 1, 2]. The randomness should be determinable by some seed (which however is standard to most PRNGs in common languages).
The naive approach would be to just instantiate an array of size N, fill it with the numbers and use any shuffling algorithm.
The problem however is, that the memory complexity of this approach is O(n), which in my special case is not tractable. My idea is, to write a generator that iteratively provides the numbers in the resulting list.
To be more precise I want some "algorithm" that provides the numbers in an iterative fashion. To be more precise, a conceptual class would look like this:
class Generator {
// some state
int nextNumber(...) {
// some magic
}
}
And calling the nextNumber method iteratively provides the numbers of the sequence (i.e. any permutation of [0, 1, ... N - 1]. Of course that state of this generator instance should have a better memory complexity than just O(n) again (I would gain nothing).
Is there any algorithm to do, what I desire?
Here's a fairly simple implementation in Python 3 of Format-preserving encryption using a balanced Feistel network that I wrote almost 2 years ago. It can perform the kind of index permutation that you want for N up to 264 on a 32 bit system, or 2128 on a 64 bit build of Python. That's due to the size of the integer returned by the hash()
function. See sys.hash_info
to find the limits for your system. It wouldn't be hard to use a fancier hash function that can return values with greater bit length, but I didn't want to make this code more complicated or slower.
I've made a few minor improvements to the previous version, and I've added some more information in the comments. Instead of using the low bits returned from the hash function, we use the high bits, which generally improves the randomness, especially for short bit lengths. I've also added another hash function, xxhash by Yann Collet, which works works much better than Python's hash
for this application, especially for shorter bit lengths, although it is a little slower. The xxhash algorithm has a much higher avalanche effect than the built-in hash
, so the resulting permutations tend to be more well-shuffled.
Although this code works for small values of stop
it's more suited to handle stop >= 2**16
. If you need to permute smaller ranges it's probably a Good Idea to just use random.shuffle
on list(range(stop))
. It will be faster, and it doesn't use that much RAM: list(range(2**16))
consumes about 1280 kilobytes on a 32 bit machine.
You will notice that I use a string to seed the random number generator. For this application we want the randomizer to have plenty of entropy, and using a big string (or bytes
) is an easy way to do that, as the random
module docs mention. Even so, this program can only produce a tiny fraction of all the possible permutations when stop
is large. For stop == 35
there are 35! (35 factorial) different permutations, and 35! > 2132, but the total bit length of our keys is only 128, so they can't cover all those permutations. We could increase the number of Feistel rounds to get a little more coverage, but obviously that would be impractical for large values of stop
.
''' Format preserving encryption using a Feistel network
This code is *not* suitable for cryptographic use.
See https://en.wikipedia.org/wiki/Format-preserving_encryption
https://en.wikipedia.org/wiki/Feistel_cipher
http://security.stackexchange.com/questions/211/how-to-securely-hash-passwords
A Feistel network performs an invertible transformation on its input,
so each input number produces a unique output number. The netword operates
on numbers of a fixed bit width, which must be even, i.e., the numbers
a particular network operates on are in the range(4**k), and it outputs a
permutation of that range.
To permute a range of general size we use cycle walking. We set the
network size to the next higher power of 4, and when we produce a number
higher than the desired range we simply feed it back into the network,
looping until we get a number that is in range.
The worst case is when stop is of the form 4**k + 1, where we need 4
steps on average to reach a valid n. In the typical case, where stop is
roughly halfway between 2 powers of 4, we need 2 steps on average.
Written by PM 2Ring 2016.08.22
'''
from random import Random
# xxhash by Yann Collet. Specialised for a 32 bit number
# See http://fastcompression.blogspot.com/2012/04/selecting-checksum-algorithm.html
def xxhash_num(n, seed):
n = (374761397 + seed + n * 3266489917) & 0xffffffff
n = ((n << 17 | n >> 15) * 668265263) & 0xffffffff
n ^= n >> 15
n = (n * 2246822519) & 0xffffffff
n ^= n >> 13
n = (n * 3266489917) & 0xffffffff
return n ^ (n >> 16)
class FormatPreserving:
""" Invertible permutation of integers in range(stop), 0 < stop <= 2**64
using a simple Feistel network. NOT suitable for cryptographic purposes.
"""
def __init__(self, stop, keystring):
if not 0 < stop <= 1 << 64:
raise ValueError('stop must be <=', 1 << 64)
# The highest number in the range
self.maxn = stop - 1
# Get the number of bits in each part by rounding
# the bit length up to the nearest even number
self.shiftbits = -(-self.maxn.bit_length() // 2)
self.lowmask = (1 << self.shiftbits) - 1
self.lowshift = 32 - self.shiftbits
# Make 4 32 bit round keys from the keystring.
# Create an independent random stream so we
# don't intefere with the default stream.
stream = Random()
stream.seed(keystring)
self.keys = [stream.getrandbits(32) for _ in range(4)]
self.ikeys = self.keys[::-1]
def feistel(self, n, keys):
# Split the bits of n into 2 parts & perform the Feistel
# transformation on them.
left, right = n >> self.shiftbits, n & self.lowmask
for key in keys:
left, right = right, left ^ (xxhash_num(right, key) >> self.lowshift)
#left, right = right, left ^ (hash((right, key)) >> self.lowshift)
return (right << self.shiftbits) | left
def fpe(self, n, reverse=False):
keys = self.ikeys if reverse else self.keys
while True:
# Cycle walk, if necessary, to ensure n is in range.
n = self.feistel(n, keys)
if n <= self.maxn:
return n
def test():
print('Shuffling a small number')
maxn = 10
fpe = FormatPreserving(maxn, 'secret key string')
for i in range(maxn):
a = fpe.fpe(i)
b = fpe.fpe(a, reverse=True)
print(i, a, b)
print('\nShuffling a small number, with a slightly different keystring')
fpe = FormatPreserving(maxn, 'secret key string.')
for i in range(maxn):
a = fpe.fpe(i)
b = fpe.fpe(a, reverse=True)
print(i, a, b)
print('\nHere are a few values for a large maxn')
maxn = 10000000000000000000
print('maxn =', maxn)
fpe = FormatPreserving(maxn, 'secret key string')
for i in range(10):
a = fpe.fpe(i)
b = fpe.fpe(a, reverse=True)
print('{}: {:19} {}'.format(i, a, b))
print('\nUsing a set to test that there are no collisions...')
maxn = 100000
print('maxn', maxn)
fpe = FormatPreserving(maxn, 'secret key string')
a = {fpe.fpe(i) for i in range(maxn)}
print(len(a) == maxn)
print('\nTesting that the operation is bijective...')
for i in range(maxn):
a = fpe.fpe(i)
b = fpe.fpe(a, reverse=True)
assert b == i, (i, a, b)
print('yes')
if __name__ == "__main__":
test()
output
Shuffling a small number
0 4 0
1 2 1
2 5 2
3 9 3
4 1 4
5 3 5
6 7 6
7 0 7
8 6 8
9 8 9
Shuffling a small number, with a slightly different keystring
0 9 0
1 8 1
2 3 2
3 5 3
4 2 4
5 6 5
6 1 6
7 4 7
8 7 8
9 0 9
Here are a few values for a large maxn
maxn = 10000000000000000000
0: 7071024217413923554 0
1: 5613634032642823321 1
2: 8934202816202119857 2
3: 296042520195445535 3
4: 5965959309128333970 4
5: 8417353297972226870 5
6: 7501923606289578535 6
7: 1722818114853762596 7
8: 890028846269590060 8
9: 8787953496283620029 9
Using a set to test that there are no collisions...
maxn 100000
True
Testing that the operation is bijective...
yes
0 4
1 2
2 5
3 9
4 1
5 3
6 7
7 0
8 6
9 8
Here's how to use it to make a generator:
def ipermute(stop, keystring):
fpe = FormatPreserving(stop, keystring)
for i in range(stop):
yield fpe.fpe(i)
for i, v in enumerate(ipermute(10, 'secret key string')):
print(i, v)
output
0 4
1 2
2 5
3 9
4 1
5 3
6 7
7 0
8 6
9 8
It's reasonably fast (for Python), but it's definitely not suitable for cryptography. It could be made crypto-grade by increasing the number of Feistel rounds to at least 5 and by using a suitable cryptographic hash function, eg Blake2. Also, a cryptographic method would need to be used to produce the Feistel keys. Of course, one shouldn't write cryptographic software, unless you know exactly what you're doing, since it's just too easy to write code that's vulnerable to timing attacks, etc.
What you are looking for is a pseudo-random permutation in the form of a function, say f, that maps numbers from 1 to N onto numbers from 1 to N in a pseudo-random bijective way. Then, to generate the nth number in a pseudo-random permutation, you just return f(n)
This is essentially the same problem as encryption. A block cipher with a key is a pseudo-random bijective function. If you feed it all possible plaintext blocks exactly once in some order, it will return all possible ciphertext blocks exactly once in a different, pseudo-random order.
So to solve a problem like yours, what you essentially do is create a cipher that works on numbers from 1 to N instead of 256-bit blocks or whatever. You can use the tools from cryptography to do this.
For example, you can construct your permutation function with a Feistel structure (https://en.wikipedia.org/wiki/Feistel_cipher) like this:
Since this function consists of a number of steps, each of which will map the numbers from 0 to N-1 onto the numbers from 0 to N-1 in a bijective way, the entire function will also have this property. If you feed it the numbers from 0 to N-1, you'll get them back out in a pseudo-random order.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With