How to enumerate x^2 + y^2 = z^2 - 1 (with additional constraints)

Question

Lets N be a number (10<=N<=10^5).

I have to break it into 3 numbers (x,y,z) such that it validates the following conditions.

1. x<=y<=z 
2. x^2+y^2=z^2-1;
3. x+y+z<=N

I have to find how many combinations I can get from the given numbers in a method.

I have tried as follows but it's taking so much time for a higher number and resulting in a timeout..

int N= Int32.Parse(Console.ReadLine());
List<String> res = new List<string>();

//x<=y<=z
int mxSqrt = N - 2;
int a = 0, b = 0;
for (int z = 1; z <= mxSqrt; z++)
{
    a = z * z;
    for (int y = 1; y <= z; y++)
    {
        b = y * y;
        for (int x = 1; x <= y; x++)
        {
            int x1 = b + x * x;
            int y1 = a - 1;
            if (x1 == y1 && ((x + y + z) <= N))
            {
                res.Add(x + "," + y + "," + z);
            }
        }
    }
}
Console.WriteLine(res.Count());

My question:

My solution is taking time for a bigger number (I think it's the for loops), how can I improve it?

Is there any better approach for the same?

גלעד ברקן · Accepted Answer

Here's a method that enumerates the triples, rather than exhaustively testing for them, using number theory as described here: https://mathoverflow.net/questions/29644/enumerating-ways-to-decompose-an-integer-into-the-sum-of-two-squares

Since the math took me a while to comprehend and a while to implement (gathering some code that's credited above it), and since I don't feel much of an authority on the subject, I'll leave it for the reader to research. This is based on expressing numbers as Gaussian integer conjugates. (a + bi)*(a - bi) = a^2 + b^2. We first factor the number, z^2 - 1, into primes, decompose the primes into Gaussian conjugates and find different expressions that we expand and simplify to get a + bi, which can be then raised, a^2 + b^2.

A perk of reading about the Sum of Squares Function is discovering that we can rule out any candidate z^2 - 1 that contains a prime of form 4k + 3 with an odd power. Using that check alone, I was able to reduce Prune's loop on 10^5 from 214 seconds to 19 seconds (on repl.it) using the Rosetta prime factoring code below.

The implementation here is just a demonstration. It does not have handling or optimisation for limiting x and y. Rather, it just enumerates as it goes. Play with it here.

Python code:

# https://math.stackexchange.com/questions/5877/efficiently-finding-two-squares-which-sum-to-a-prime
def mods(a, n):
    if n <= 0:
        return "negative modulus"
    a = a % n
    if (2 * a > n):
        a -= n
    return a

def powmods(a, r, n):
    out = 1
    while r > 0:
        if (r % 2) == 1:
            r -= 1
            out = mods(out * a, n)
        r /= 2
        a = mods(a * a, n)
    return out

def quos(a, n):
    if n <= 0:
        return "negative modulus"
    return (a - mods(a, n))/n

def grem(w, z):
    # remainder in Gaussian integers when dividing w by z
    (w0, w1) = w
    (z0, z1) = z
    n = z0 * z0 + z1 * z1
    if n == 0:
        return "division by zero"
    u0 = quos(w0 * z0 + w1 * z1, n)
    u1 = quos(w1 * z0 - w0 * z1, n)
    return(w0 - z0 * u0 + z1 * u1,
           w1 - z0 * u1 - z1 * u0)

def ggcd(w, z):
    while z != (0,0):
        w, z = z, grem(w, z)
    return w

def root4(p):
    # 4th root of 1 modulo p
    if p <= 1:
        return "too small"
    if (p % 4) != 1:
        return "not congruent to 1"
    k = p/4
    j = 2
    while True:
        a = powmods(j, k, p)
        b = mods(a * a, p)
        if b == -1:
            return a
        if b != 1:
            return "not prime"
        j += 1

def sq2(p):
    if p % 4 != 1:
      return "not congruent to 1 modulo 4"
    a = root4(p)
    return ggcd((p,0),(a,1))

# https://rosettacode.org/wiki/Prime_decomposition#Python:_Using_floating_point
from math import floor, sqrt

def fac(n):
    step = lambda x: 1 + (x<<2) - ((x>>1)<<1)
    maxq = long(floor(sqrt(n)))
    d = 1
    q = n % 2 == 0 and 2 or 3 
    while q <= maxq and n % q != 0:
        q = step(d)
        d += 1
    return q <= maxq and [q] + fac(n//q) or [n]

# My code...
# An answer for  https://stackoverflow.com/questions/54110614/

from collections import Counter
from itertools import product
from sympy import I, expand, Add

def valid(ps):
  for (p, e) in ps.items():
    if (p % 4 == 3) and (e & 1):
      return False
  return True

def get_sq2(p, e):
  if p == 2:
    if e & 1:
      return [2**(e / 2), 2**(e / 2)]
    else:
      return [2**(e / 2), 0]
  elif p % 4 == 3:
    return [p, 0]
  else:
    a,b = sq2(p)
    return [abs(a), abs(b)]

def get_terms(cs, e):
  if e == 1:
    return [Add(cs[0], cs[1] * I)]
  res = [Add(cs[0], cs[1] * I)**e]
  for t in xrange(1, e / 2 + 1):
    res.append(
      Add(cs[0] + cs[1]*I)**(e-t) * Add(cs[0] - cs[1]*I)**t)
  return res

def get_lists(ps):
  items = ps.items()
  lists = []
  for (p, e) in items:
    if p == 2:
      a,b = get_sq2(2, e)
      lists.append([Add(a, b*I)])
    elif p % 4 == 3:
      a,b = get_sq2(p, e)
      lists.append([Add(a, b*I)**(e / 2)])
    else:
      lists.append(get_terms(get_sq2(p, e), e))
  return lists


def f(n):
  for z in xrange(2, n / 2):
    zz = (z + 1) * (z - 1)
    ps = Counter(fac(zz))
    is_valid = valid(ps)
    if is_valid:
      print "valid (does not contain a prime of form
4k + 3 with an odd power)"
      print "z: %s, primes: %s" % (z, dict(ps))
      lists = get_lists(ps)
      cartesian = product(*lists)
      for element in cartesian:
        print "prime square decomposition: %s" % list(element)
        p = 1
        for item in element:
          p *= item
        print "complex conjugates: %s" % p
        vals = p.expand(complex=True, evaluate=True).as_coefficients_dict().values()
        x, y = vals[0], vals[1] if len(vals) > 1 else 0
        print "x, y, z: %s, %s, %s" % (x, y, z)
        print "x^2 + y^2, z^2-1: %s, %s" % (x**2 + y**2, z**2 - 1)
      print ''

if __name__ == "__main__":
  print f(100)

Output:

valid (does not contain a prime of form
4k + 3 with an odd power)
z: 3, primes: {2: 3}
prime square decomposition: [2 + 2*I]
complex conjugates: 2 + 2*I
x, y, z: 2, 2, 3
x^2 + y^2, z^2-1: 8, 8

valid (does not contain a prime of form
4k + 3 with an odd power)
z: 9, primes: {2: 4, 5: 1}
prime square decomposition: [4, 2 + I]
complex conjugates: 8 + 4*I
x, y, z: 8, 4, 9
x^2 + y^2, z^2-1: 80, 80

valid (does not contain a prime of form
4k + 3 with an odd power)
z: 17, primes: {2: 5, 3: 2}
prime square decomposition: [4 + 4*I, 3]
complex conjugates: 12 + 12*I
x, y, z: 12, 12, 17
x^2 + y^2, z^2-1: 288, 288

valid (does not contain a prime of form
4k + 3 with an odd power)
z: 19, primes: {2: 3, 3: 2, 5: 1}
prime square decomposition: [2 + 2*I, 3, 2 + I]
complex conjugates: (2 + I)*(6 + 6*I)
x, y, z: 6, 18, 19
x^2 + y^2, z^2-1: 360, 360

valid (does not contain a prime of form
4k + 3 with an odd power)
z: 33, primes: {17: 1, 2: 6}
prime square decomposition: [4 + I, 8]
complex conjugates: 32 + 8*I
x, y, z: 32, 8, 33
x^2 + y^2, z^2-1: 1088, 1088

valid (does not contain a prime of form
4k + 3 with an odd power)
z: 35, primes: {17: 1, 2: 3, 3: 2}
prime square decomposition: [4 + I, 2 + 2*I, 3]
complex conjugates: 3*(2 + 2*I)*(4 + I)
x, y, z: 18, 30, 35
x^2 + y^2, z^2-1: 1224, 1224

Prune · Answer

Here is a simple improvement in Python (converting to the faster equivalent in C-based code is left as an exercise for the reader). To get accurate timing for the computation, I removed printing the solutions themselves (after validating them in a previous run).

Use an outer loop for one free variable (I chose z), constrained only by its relation to N.
Use an inner loop (I chose y) constrained by the outer loop index.
The third variable is directly computed per requirement 2.

Timing results:

-------------------- 10 
 1 solutions found in 2.3365020751953125e-05  sec.
-------------------- 100 
 6 solutions found in 0.00040078163146972656  sec.
-------------------- 1000 
 55 solutions found in 0.030081748962402344  sec.
-------------------- 10000 
 543 solutions found in 2.2078349590301514  sec.
-------------------- 100000 
 5512 solutions found in 214.93411707878113  sec.

That's 3:35 for the large case, plus your time to collate and/or print the results.

If you need faster code (this is still pretty brute-force), look into Diophantine equations and parameterizations to generate (y, x) pairs, given the target value of z^2 - 1.

import math
import time

def break3(N):
    """
    10 <= N <= 10^5
    return x, y, z triples such that:
        x <= y <= z
        x^2 + y^2 = z^2 - 1        
        x + y + z <= N
    """

    """
    Observations:
    z <= x + y
    z < N/2
    """

    count = 0
    z_limit = N // 2
    for z in range(3, z_limit):

        # Since y >= x, there's a lower bound on y
        target = z*z - 1
        ymin = int(math.sqrt(target/2))
        for y in range(ymin, z):
            # Given y and z, compute x.
            # That's a solution iff x is integer.
            x_target = target - y*y
            x = int(math.sqrt(x_target))
            if x*x == x_target and x+y+z <= N:
                # print("solution", x, y, z)
                count += 1

    return count


test = [10, 100, 1000, 10**4, 10**5]
border = "-"*20

for case in test: 
    print(border, case)
    start = time.time()
    print(break3(case), "solutions found in", time.time() - start, "sec.")

Mat · Answer

The bounds of x and y are an important part of the problem. I personally went with this Wolfram Alpha query and checked the exact forms of the variables.

Thanks to @Bleep-Bloop and comments, a very elegant bound optimization was found, which is x < n and x <= y < n - x. The results are the same and the times are nearly identical.

Also, since the only possible values for x and y are positive even integers, we can reduce the amount of loop iterations by half.

To optimize even further, since we compute the upper bound of x, we build a list of all possible values for x and make the computation parallel. That saves a massive amount of time on higher values of N but it's a bit slower for smaller values because of the overhead of the parallelization.

Here's the final code:

Non-parallel version, with int values:

List<string> res = new List<string>();
int n2 = n * n;

double maxX = 0.5 * (2.0 * n - Math.Sqrt(2) * Math.Sqrt(n2 + 1));

for (int x = 2; x < maxX; x += 2)
{
    int maxY = (int)Math.Floor((n2 - 2.0 * n * x - 1.0) / (2.0 * n - 2.0 * x));

    for (int y = x; y <= maxY; y += 2)
    {
        int z2 = x * x + y * y + 1;
        int z = (int)Math.Sqrt(z2);

        if (z * z == z2 && x + y + z <= n)
            res.Add(x + "," + y + "," + z);
    }
}

Parallel version, with long values:

using System.Linq;

...

// Use ConcurrentBag for thread safety
ConcurrentBag<string> res = new ConcurrentBag<string>();
long n2 = n * n;

double maxX = 0.5 * (2.0 * n - Math.Sqrt(2) * Math.Sqrt(n2 + 1L));

// Build list to parallelize
int nbX = Convert.ToInt32(maxX);
List<int> xList = new List<int>();
for (int x = 2; x < maxX; x += 2)
    xList.Add(x);

Parallel.ForEach(xList, x =>
{
    int maxY = (int)Math.Floor((n2 - 2.0 * n * x - 1.0) / (2.0 * n - 2.0 * x));

    for (long y = x; y <= maxY; y += 2)
    {
        long z2 = x * x + y * y + 1L;
        long z = (long)Math.Sqrt(z2);

        if (z * z == z2 && x + y + z <= n)
            res.Add(x + "," + y + "," + z);
    }
});

When ran individually on a i5-8400 CPU, I get these results:

N: 10; Solutions: 1; Time elapsed: 0.03 ms (Not parallel, int)

N: 100; Solutions: 6; Time elapsed: 0.05 ms (Not parallel, int)

N: 1000; Solutions: 55; Time elapsed: 0.3 ms (Not parallel, int)

N: 10000; Solutions: 543; Time elapsed: 13.1 ms (Not parallel, int)

N: 100000; Solutions: 5512; Time elapsed: 849.4 ms (Parallel, long)

You must use long when N is greater than 36340, because when it's squared, it overflows an int's max value. Finally, the parallel version starts to get better than the simple one when N is around 23000, with ints.

404 · Answer

No time to properly test it, but seemed to yield the same results as your code (at 100 -> 6 results and at 1000 -> 55 results).

With N=1000 a time of 2ms vs your 144ms also without List

and N=10000 a time of 28ms

var N = 1000;
var c = 0;

for (int x = 2; x < N; x+=2)
{
    for (int y = x; y < (N - x); y+=2)
    {
        long z2 = x * x + y * y + 1;
        int z = (int) Math.Sqrt(z2);
        if (x + y + z > N)
            break;
        if (z * z == z2)
            c++;
    }
}

Console.WriteLine(c);

Justice_Lords · Answer

#include<iostream>
#include<math.h>
int main()
{
    int N = 10000;
    int c = 0;
    for (int x = 2; x < N; x+=2)
    {
        for (int y = x; y < (N - x); y+=2)
        {
            auto z = sqrt(x * x + y * y + 1);
            if(x+y+z>N){
                break;
            }
            if (z - (int) z == 0)
            {
                c++;
            }
        }
    }
    std::cout<<c;
}

This is my solution. On testing the previous solutions for this problem I found that x,y are always even and z is odd. I dont know the mathematical nature behind this, I am currently trying to figure that out.

How to enumerate x^2 + y^2 = z^2 - 1 (with additional constraints)

Tags:

c#

algorithm

number-theory

nonlinear-functions

A_Sk

5 Answers

גלעד ברקן

Prune

Mat

404

Justice_Lords

Recent Activity

Donate For Us

How to enumerate x^2 + y^2 = z^2 - 1 (with additional constraints)

Tags:

c#

algorithm

number-theory

nonlinear-functions

A_Sk

5 Answers

גלעד ברקן

Prune

Mat

404

Justice_Lords

Related questions

Recent Activity

Donate For Us