I have some simple code that does the following.
It iterates over all possible length n lists F
with +-1 entries. For each one, it iterates over all possible length 2n
lists S
with +-1 entries, where the first half of $S$ is simply a copy of the second half. The code computes the inner product of F
with each sublist of S
of length n
. For each F, S it counts the inner products that are zero until the first non-zero inner product.
Here is the code.
#!/usr/bin/python
from __future__ import division
import itertools
import operator
import math
n=14
m=n+1
def innerproduct(A, B):
assert (len(A) == len(B))
s = 0
for k in xrange(0,n):
s+=A[k]*B[k]
return s
leadingzerocounts = [0]*m
for S in itertools.product([-1,1], repeat = n):
S1 = S + S
for F in itertools.product([-1,1], repeat = n):
i = 0
while (i<m):
ip = innerproduct(F, S1[i:i+n])
if (ip == 0):
leadingzerocounts[i] +=1
i+=1
else:
break
print leadingzerocounts
The correct output for n=14
is
[56229888, 23557248, 9903104, 4160640, 1758240, 755392, 344800, 172320, 101312, 75776, 65696, 61216, 59200, 59200, 59200]
Using pypy, this takes 1 min 18 seconds for n = 14. Unfortunately, I would really like to run it for 16,18,20,22,24,26. I don't mind using numba or cython but I would like to stay close to python if at all possible.
Any help speeding this up is very much appreciated.
I'll keep a record of the fastest solutions here. (Please let me know if I miss an updated answer.)
This new code gets another order of magnitude speedup by taking advantage of the cyclic symmetry of the problem. This Python version enumerates necklaces with Duval's algorithm; the C version uses brute force. Both incorporate the speedups described below. On my machine, the C version solves n = 20 in 100 seconds! A back-of-the-envelope calculation suggests that, if you were to let it run for a week on a single core, it could do n = 26, and, as noted below, it's amenable to parallelism.
import itertools
def necklaces_with_multiplicity(n):
assert isinstance(n, int)
assert n > 0
w = [1] * n
i = 1
while True:
if n % i == 0:
s = sum(w)
if s > 0:
yield (tuple(w), i * 2)
elif s == 0:
yield (tuple(w), i)
i = n - 1
while w[i] == -1:
if i == 0:
return
i -= 1
w[i] = -1
i += 1
for j in range(n - i):
w[i + j] = w[j]
def leading_zero_counts(n):
assert isinstance(n, int)
assert n > 0
assert n % 2 == 0
counts = [0] * n
necklaces = list(necklaces_with_multiplicity(n))
for combo in itertools.combinations(range(n - 1), n // 2):
for v, multiplicity in necklaces:
w = list(v)
for j in combo:
w[j] *= -1
for i in range(n):
counts[i] += multiplicity * 2
product = 0
for j in range(n):
product += v[j - (i + 1)] * w[j]
if product != 0:
break
return counts
if __name__ == '__main__':
print(leading_zero_counts(12))
C version:
#include <stdio.h>
enum {
N = 14
};
struct Necklace {
unsigned int v;
int multiplicity;
};
static struct Necklace g_necklace[1 << (N - 1)];
static int g_necklace_count;
static void initialize_necklace(void) {
g_necklace_count = 0;
for (unsigned int v = 0; v < (1U << (N - 1)); v++) {
int multiplicity;
unsigned int w = v;
for (multiplicity = 2; multiplicity < 2 * N; multiplicity += 2) {
w = ((w & 1) << (N - 1)) | (w >> 1);
unsigned int x = w ^ ((1U << N) - 1);
if (w < v || x < v) goto nope;
if (w == v || x == v) break;
}
g_necklace[g_necklace_count].v = v;
g_necklace[g_necklace_count].multiplicity = multiplicity;
g_necklace_count++;
nope:
;
}
}
int main(void) {
initialize_necklace();
long long leading_zero_count[N + 1];
for (int i = 0; i < N + 1; i++) leading_zero_count[i] = 0;
for (unsigned int v_xor_w = 0; v_xor_w < (1U << (N - 1)); v_xor_w++) {
if (__builtin_popcount(v_xor_w) != N / 2) continue;
for (int k = 0; k < g_necklace_count; k++) {
unsigned int v = g_necklace[k].v;
unsigned int w = v ^ v_xor_w;
for (int i = 0; i < N + 1; i++) {
leading_zero_count[i] += g_necklace[k].multiplicity;
w = ((w & 1) << (N - 1)) | (w >> 1);
if (__builtin_popcount(v ^ w) != N / 2) break;
}
}
}
for (int i = 0; i < N + 1; i++) {
printf(" %lld", 2 * leading_zero_count[i]);
}
putchar('\n');
return 0;
}
You can get a bit of speedup by exploiting the sign symmetry (4x) and by iterating over only those vectors that pass the first inner product test (asymptotically, O(sqrt(n))x).
import itertools
n = 10
m = n + 1
def innerproduct(A, B):
s = 0
for k in range(n):
s += A[k] * B[k]
return s
leadingzerocounts = [0] * m
for S in itertools.product([-1, 1], repeat=n - 1):
S1 = S + (1,)
S1S1 = S1 * 2
for C in itertools.combinations(range(n - 1), n // 2):
F = list(S1)
for i in C:
F[i] *= -1
leadingzerocounts[0] += 4
for i in range(1, m):
if innerproduct(F, S1S1[i:i + n]):
break
leadingzerocounts[i] += 4
print(leadingzerocounts)
C version, to get a feel for how much performance we're losing to PyPy (16 for PyPy is roughly equivalent to 18 for C):
#include <stdio.h>
enum {
HALFN = 9,
N = 2 * HALFN
};
int main(void) {
long long lzc[N + 1];
for (int i = 0; i < N + 1; i++) lzc[i] = 0;
unsigned int xor = 1 << (N - 1);
while (xor-- > 0) {
if (__builtin_popcount(xor) != HALFN) continue;
unsigned int s = 1 << (N - 1);
while (s-- > 0) {
lzc[0]++;
unsigned int f = xor ^ s;
for (int i = 1; i < N + 1; i++) {
f = ((f & 1) << (N - 1)) | (f >> 1);
if (__builtin_popcount(f ^ s) != HALFN) break;
lzc[i]++;
}
}
}
for (int i = 0; i < N + 1; i++) printf(" %lld", 4 * lzc[i]);
putchar('\n');
return 0;
}
This algorithm is embarrassingly parallel because it's just accumulating over all values of xor
. With the C version, a back-of-the-envelope calculation suggests that a few thousand hours of CPU time would suffice to calculate n = 26
, which works out to a couple hundred dollars at current rates on EC2. There are undoubtedly some optimizations to be made (e.g., vectorization), but for a one-off like this I'm not sure how much more programmer effort is worthwhile.
One very simple speed up of a factor of n is to change this code:
def innerproduct(A, B):
assert (len(A) == len(B))
for j in xrange(len(A)):
s = 0
for k in xrange(0,n):
s+=A[k]*B[k]
return s
to
def innerproduct(A, B):
assert (len(A) == len(B))
s = 0
for k in xrange(0,n):
s+=A[k]*B[k]
return s
(I don't know why you have the loop over j, but it just does the same calculation each time so is unnecessary.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With