Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm about number theory

Tags:

c++

algorithm

Give two positive integer a,b (1<=a<=30, 1<=b<=10000000) ,and define two unrepeatable set L and R,

L = {x * y | 1 <= x <= a, 1 <= y <= b, x,y is integer}

R = {x ^ y | 1 <= x <= a, 1 <= y <= b, x,y is integer},

^ is XOR operate

For any two integer: A∈L, B∈R, we format B to n+1(n is the decimal digit number of b) decimal digit(fill 0 in front of B),and then joint B to the end of A and get a new integer AB.

Compute the sum of all generated integer AB (In case the sum exceed, just return "sum mod 1000000007", mod means modular operation)

Note: the time of your algorithm is no more than 3 seconds

My algorithm is very simple:we can easily get the max number in set R, and the element in R is 0,1,2,3...maxXor,(the element max(a,b) may be not in R), using a hash table the compute set L. But the algorithm consume 4 seconds when a = 30, b = 100000.


Give an example:

a = 2, b = 4, so

L = {1 * 1, 1 * 2, 1 * 3, 1 * 4, 2 * 1, 2 * 2, 2 * 3, 2 * 4} =  {1, 2, 3, 4, 6, 8}

R =  {1^1,1^2,1^3,1^4,2^1,2^2,2^3,2^4} =  {0, 1, 2, 3, 5, 6}

All generated integer AB is:

{

100, 101, 102, 103, 105, 106,

200, 201, 202, 203, 205, 206,

300, 301, 302, 303, 305, 306,

400, 401, 402, 403, 405, 406,

600, 601, 602, 603, 605, 606,

800, 801, 802, 803, 805, 806

}

The sum of all AB is 14502

like image 405
tenos Avatar asked Dec 14 '13 13:12

tenos


1 Answers

So the number AB can be written as 10^(n+1) A + B. Which means that summing over all A, B, the total is equal to

|R| 10^(n+1) Sum(A in L) + |L| Sum(B in R)

In your example,

|L| = 6
|R| = 6
Sum(A in L) = 24
Sum(B in R) = 17
n = 3

which when plugged into the above formula gives 14,502.

This reduces the runtime in the size of the sets from quadratic to linear, so you should see quite a huge improvement.

The next bits I haven't fleshed out fully because I don't have the time to, but they feel like they should work:

  • First, notice that Sum(A in L) would be trivial to calculate using

    1 + 2 + .. + n = n(n-1)/2
    

    if there wasn't the constraint that L doesn't contain repeats. You can get around this though by exploiting the fact that a is very small: iteratively calculate the sums 1, .., a using the triangular number formula and use that information to avoid counting a product more than once.

  • For Sum(B in R), notice that when you compare y and x^y, at most the first lg(a) bits have changed. So you can split a sum of x^ys into two sums: one which deals with the bits from lg(a)+1 upwards and which depends only on b, and a second, more complex sum which deals with the bits from lg(a) downwards and which depends on a and b.

Edit: The OP's asked me to expand on how to quickly compute Sum(A in L). There was a lottt of stuff in this section in previous edits, but I've actually sat down and worked through it now rather than haphazardly batting it around in my head. It also turned out to be more complicated than I expected, so my apologies for not sitting down and working through it sooner @tenos.

So what we want to do is take the sum of all distinct products x*y such that 1 <= x <= a and 1 <= y <= b. Well, that turns out to be pretty hard so let's start with a simpler problem: given two integers x1, x2 with x1 < x2, how can we compute the sum of all distinct products x1*y or x2*y where 1 <= y <= b?

If we dropped the distinctness criterion, this'd be easy: it'd simply be

x1*Sum(b) + x2*Sum(b)

where Sum(j) denotes the sum of integers 1 through j inclusive, and can be calculated using Gauss's formula for the triangular numbers. So again we can reduce the problem into something simpler: how can we find the sum of all products that appear in both the left and right terms?

Well, two products are equal if

x1*y1 == x2*y2

This happens exactly when x1*y1 == x2*y2 == k*LCM(x1, x2), where LCM is the lowest common multiple and k is some integer.

The sum of this over all k such that 1 <= k*LCM(x1, x2) <= x1*b is

R(x1, x2) = LCM(x1, x2) * Sum(x1*b/LCM(x1, x2))

where R stands for "repeats". Which means that our sum of all distinct products x1*y or x2*y where 1 <= y <= b is

x1*Sum(b) + x2*Sum(b) - R(x1, x2)

Next, let's extend the definition of R to be defined on three variables x1 < x2 < x3 as

R(x1, x2, x3) = LCM(x1, x2, x3) * Sum(x1*b/LCM(x1, x2, x3))

and similarly for 4 variables, 5 variables, etc. Then the sum of distinct products for three x1 < x2 < x3 is

x1*Sum(b) + x2*Sum(b) + x3*Sum(b) - R(x1, x2) - R(x1, x3) - R(x2, x3) + R(x1, x2, x3)

by the inclusion-exclusion principle.

So, let's make use of this. Define

Sum for x = 1: 1*Sum(b)
Sum for x = 2: 2*Sum(b) - R(2, 1)
Sum for x = 3: 3*Sum(b) - R(3, 2) - R(3, 1) + R(3, 2, 1)

Etc. Then the sum of all these sums up to x = a is the sum of all distinct products.

Edit: @tenos turned this into a useful solution. He noticed that since i*Sum(b) contains many repeats, we can replace by i*sum(k...b), k = max(b/minPrimeFactor(i) + 1, i).

Further, when using inclusion-exclusion principle, many unnecessary computations can be pruned. For instance, if R(1,2) = NULL, there is no need to compute R(1,2,3), R(1,2,4).., etc. In fact, when b is very big, there are many R(i,..j) = NULL.

like image 197
Andy Jones Avatar answered Oct 19 '22 20:10

Andy Jones