Fastest way to Factor (Prime-1)/2 for 64-bit Prime?

Question

I'm trying to gather some statistics on prime numbers, among which is the distribution of factors for the number (prime-1)/2. I know there are general formulas for the size of factors of uniformly selected numbers, but I haven't seen anything about the distribution of factors of one less than a prime.

I've written a program to iterate through primes starting at the first prime after 2^63, and then factor the (prime - 1)/2 using trial division by all primes up to 2^32. However, this is extremely slow because that is a lot of primes (and a lot of memory) to iterate through. I store the primes as a single byte each (by storing the increment from one prime to the next). I also use a deterministic variant of the Miller-Rabin primality test for numbers up to 2^64, so I can easily detect when the remaining value (after a successful division) is prime.

I've experimented using a variant of pollard-rho and elliptic curve factorization, but it is hard to find the right balance of between trial division and switching to these more complicated methods. Also I'm not sure I've implemented them correctly, because sometimes they seem to take a very lone time to find a factor, and based on their asymptotic behavior, I'd expect them to be quite quick for such small numbers.

I have not found any information on factoring many numbers (vs just trying to factor one), but it seems like there should be some way to speed up the task by taking advantage of this.

Any suggestions, pointers to alternate approaches, or other guidance on this problem is greatly appreciated.

Edit: The way I store the primes is by storing an 8-bit offset to the next prime, with the implicit first prime being 3. Thus, in my algorithms, I have a separate check for division by 2, then I start a loop:

factorCounts = collections.Counter()
while N % 2 == 0:
    factorCounts[2] += 1
    N //= 2
pp = 3
for gg in smallPrimeGaps:
    if pp*pp > N:
        break
    if N % pp == 0:
        while N % pp == 0:
            factorCounts[pp] += 1
            N //= pp
    pp += gg

Also, I used a wheel sieve to calculate the primes for trial division, and I use an algorithm based on the remainder by several primes to get the next prime after the given starting point.

I use the following for testing if a given number is prime (porting code to c++ now):

bool IsPrime(uint64_t n)
{
    if(n < 341531)
        return MillerRabinMulti(n, {9345883071009581737ull});
    else if(n < 1050535501)
        return MillerRabinMulti(n, {336781006125ull, 9639812373923155ull});
    else if(n < 350269456337)
        return MillerRabinMulti(n, {4230279247111683200ull, 14694767155120705706ull, 1664113952636775035ull});
    else if(n < 55245642489451)
        return MillerRabinMulti(n, {2ull, 141889084524735ull, 1199124725622454117, 11096072698276303650});
    else if(n < 7999252175582851)
        return MillerRabinMulti(n, {2ull, 4130806001517ull, 149795463772692060ull, 186635894390467037ull, 3967304179347715805ull});
    else if(n < 585226005592931977)
        return MillerRabinMulti(n, {2ull, 123635709730000ull, 9233062284813009ull, 43835965440333360ull, 761179012939631437ull, 1263739024124850375ull});
    else
        return MillerRabinMulti(n, {2ull, 325ull, 9375ull, 28178ull, 450775ull, 9780504ull, 1795265022ull});
}

user448810 · Accepted Answer

I don't have a definitive answer, but I do have some observations and some suggestions.

There are about 2*10^17 primes between 2^63 and 2^64, so any program you write is going to run for a while.

Let's talk about a primality test for numbers in the range 2^63 to 2^64. Any general-purpose test will do more work than you need, so you can speed things up by writing a special-purpose test. I suggest strong-pseudoprime tests (as in Miller-Rabin) to bases 2 and 3. If either of those tests shows the number is composite, you're done. Otherwise, look up the number (binary search) in a table of strong-pseudoprimes to bases 2 and 3 (ask Google to find those tables for you). Two strong pseudoprime tests followed by a table lookup will certainly be faster than the deterministic Miller-Rabin test you are currently performing, which probably uses six or seven bases.

For factoring, trial division to 1000 followed by Brent-Rho until the product of the known prime factors exceeds the cube root of the number being factored ought to be fairly fast, a few milliseconds. Then, if the remaining cofactor is composite, it will necessarily have only two factors, so SQUFOF would be a good algorithm to split them, faster than the other methods because all the arithmetic is done with numbers less than the square root of the number being factored, which in your case means the factorization could be done using 32-bit arithmetic instead of 64-bit arithmetic, so it ought to be fast.

Instead of factoring and primality tests, a better method uses a variant of the Sieve of Eratosthenes to factor large blocks of numbers. That will still be slow, as there are 203 million sieving primes less than 2^32, and you will need to deal with the bookkeeping of a segmented sieve, but considering that you factor lots of numbers at once, it's probably the best approach to your task.

I have code for everything mentioned above at my blog.

Arty · Answer

Very interesting question you have! Decided to code my own full solution from scratch in C++.

I have 3 main math functions (and several non-math) in my code:

GenPrimesSieve_Eratosthenes() which is used to generate all 32-bit primes, there are total 203280221 (see here) primes that are 32-bit. These primes are used in functions mentioned in 2) and 3). This function uses Sieve of Eratosthenes.
PrimeSieve() is used for sieving a range suggested by you starting from 2^63. This function sieves only prime numbers. Algorithm similar to Eratosthenes is used.
FactorSieve() is used to use sieve to factor every number in a range. Actually I factor only those that have 2 * n + 1 as prime, i.e. those from range sieved by PrimeSieve().

All 3 functions above do almost same kind of work:

They take some small 32-bit prime P and sieve a range through it. For that initial offset near range start is computed. It is the smallest offset after range start that is a multiple of P.
Then using For loop I iterate with step P over whole range.
For the case of Sieve of Eratosthenes and also PrimeSieve() I mark every P's number as composite. For the case of FactorSieve() I store P as prime factor inside each number where P hits in the range.
Sieve of Eratosthens and PrimeSieve() they both go through bit vector where composites are marked and every non-composite is stored into final answer vector (array).
PrimeSieve() doesn't have bit-vector of composites, but instead while sieving it collected into vector-of-vector, it collected all prime factors. Only 32-bit factors are collected through sieving. To compute remaining 64-bit prime factor (it can be only one at most), I multiply all prime factors and divide number itself by this product.
By theorem taken from Sieve of Eratosthenes it is known that it is enough to sieve by at most P = Sqrt(N), which implies that in FactorSieve() for 64-bit numbers to find 64-bit prime factor it is enough to sieve through only 32-bit primes. Also for PrimeSieve() it is enough to sieve by only all 32-bit numbers to PROVE that 64-bit number is prime.

There are also several other functions, for example big one is ProcessRange() which call PrimeSieve() and FactorSieve() and then outputs generated primes P above 2^63 and factors of (P - 1) / 2, outputs them to file.

All config variables are located at start of .cpp file starting from the line where output file name "output.txt" is located.

Another big function is Main() which first computes all 32-bit primes using Sieve of Eratosthenes, and then using this primes it calls ProcessRange() which creates separate threads for each range using std::async(), thus it uses all CPU cores that you have, starting as many threads as there are hardware threads (usually number of cores multiplied by 2 due to hyper-threading technology), number of threads is computed using std::thread::hardware_concurrency.

Also std::mutex is used for synchronization of threads and data, and std::this_thread::yield() for giving away time to other threads when current thread is iddle. std::chrono::high_resolution_clock::now() is used to measure time. And std::vector is used as container for all numbers and primes.

Resulting primes and factors are outputted to "output.txt" file, and progress statistics is written to console using std::cout.

Try it online!

#include <cstdint>
#include <vector>
#include <iostream>
#include <iomanip>
#include <fstream>
#include <stdexcept>
#include <optional>
#include <future>
#include <string>
#include <chrono>
#include <ctime>
#include <sstream>
#include <algorithm>
#include <mutex>
#include <atomic>
#include <thread>

#define ASSERT_MSG(cond, msg) { if (!(cond)) throw std::runtime_error("Assertion (" #cond ") failed at line " + std::to_string(__LINE__) + "! Msg: '" + std::string(msg) + "'."); }
#define ASSERT(cond) ASSERT_MSG(cond, "")
#define COUT(code) { std::lock_guard<std::recursive_mutex> lock(g_cout_mux); std::cout << TimeUtcStr() << ": " code; std::cout << std::flush; }
#define LN { std::cout << "LN " << std::setw(5) << __LINE__ << "  " << __FUNCTION__ << "  " << __FILE__ << std::endl << std::flush; }

static std::recursive_mutex g_cout_mux;

using u8 = uint8_t;
using u32 = uint32_t;
using u64 = uint64_t;

static std::string const out_fname = "output.txt";
u64 constexpr
    out_file_buf_size = 16ULL << 20, // This much bytes is used for output file buffering
    base_range_bits = 63, // Start sieving from 2^base_range_bits + 2^small_range_bits * small_irange_start
    small_range_bits = 24, // Small range bits sieved in single thread, in this case [2^63 + 2^24 * 5, 2^63 + 2^24 * (5 + 1)).
    small_irange_start = 0; // Which small range index should start from, all previous ranges should be sieved already

static double TimeSec() {
    static auto const gtb = std::chrono::high_resolution_clock::now();
    return std::chrono::duration_cast<std::chrono::duration<double>>(std::chrono::high_resolution_clock::now() - gtb)
        .count();
}

static size_t NThreads() {
    return std::thread::hardware_concurrency();
}

static std::string TimeUtcStr() {
    auto now = std::chrono::system_clock::now();
    auto seconds_since_epoch = std::chrono::duration_cast<std::chrono::seconds>(now.time_since_epoch());
    auto fractional_seconds = std::chrono::duration_cast<std::chrono::microseconds>(
        now.time_since_epoch() - seconds_since_epoch);
    std::time_t now_as_time_t = std::chrono::system_clock::to_time_t(now);
    std::tm utc_tm = *std::gmtime(&now_as_time_t);
    char buffer[80];
    std::strftime(buffer, sizeof(buffer), "%Y.%m.%d %H:%M:%S", &utc_tm);
    std::stringstream ss;
    ss << buffer << '.' << std::setfill('0') << std::setw(3) << fractional_seconds.count() / 1000;
    return ss.str();
}

template <typename T>
std::vector<T> GenPrimesSieve_Eratosthenes(uint64_t const end) {
    // https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
    if (end <= 2)
        return {};
    uint64_t const cnt = end >> 1;
    std::vector<uint8_t> composites((cnt + 7) / 8);
    auto Get = [&](uint64_t const i){
        return bool((composites[i / 8] >> (i % 8)) & 1);
    };
    auto Set = [&](uint64_t const i){
        composites[i / 8] |= uint8_t(1) << (i % 8);
    };
    std::vector<T> primes = {2};
    uint64_t i = 0, ops = 0;
    for (i = 1; i < cnt; ++i) {
        if (Get(i))
            continue;
        uint64_t const
            p = 2 * i + 1,
            start = (p * p) >> 1;
        primes.push_back(p);
        if (start >= cnt)
            break;
        for (uint64_t j = start; j < cnt; j += p)
            Set(j);
    }
    for (i = i + 1; i < cnt; ++i)
        if (!Get(i))
            primes.push_back(2 * i + 1);
    primes.shrink_to_fit();
    return primes;
}

static std::vector<u64> PrimeSieve(u64 begin, u64 end, std::vector<u32> const & small_primes) {
    if (begin % 2 == 0) ++begin;
    if (end % 2 == 1) --end;
    if (end <= begin) return {};
    
    u64 const cnt = (end - begin + 1) >> 1;
    std::vector<uint8_t> composites((cnt + 7) / 8); composites.shrink_to_fit();
    auto Get = [&](u64 i){ return bool((composites[i / 8] >> (i % 8)) & 1); };
    auto Set = [&](u64 i){ composites[i / 8] |= u8(1) << (i % 8); };
    
    for (u64 const p: small_primes) {
        u64 const
            t = begin - begin % (p * 2) + p,
            start = t < begin ? t + p * 2 : t;
        for (u64 i = (start - begin) >> 1; i < cnt; i += p)
            Set(i);
        if (u64(p) * p >= end) {
            ASSERT_MSG(p < small_primes.back(),
                "To few small primes! Increase Eratosthenes sieving range.");
            break;
        }
    }
    
    std::vector<u64> primes;
    for (u64 i = 0; i < cnt; ++i)
        if (!Get(i))
            primes.push_back(begin + i * 2);
    primes.shrink_to_fit();
    
    return primes;
}

static auto FactorSieve(u64 const begin, u64 const end, std::vector<u32> const & small_primes,
        u64 const fbegin, std::vector<bool> const & filter) {
    std::vector<std::vector<u32>> lo_factors(end - begin); lo_factors.shrink_to_fit();
    std::vector<std::optional<u64>> hi_factors(end - begin); hi_factors.shrink_to_fit();
    std::vector<u64> prods(end - begin, 1); prods.shrink_to_fit();
    
    for (auto const p: small_primes) {
        u64 mp = p;
        while (true) {
            u64 const
                t = begin - begin % mp,
                start = t < begin ? t + mp : t;
            if (start >= t && end + mp >= end) // Check add-overflow
                for (u64 i = start; i < end; i += mp)
                    if (filter[i * 2 + 1 - fbegin]) {
                        lo_factors[i - begin].push_back(p);
                        prods[i - begin] *= p;
                    }
            if ((mp * p) / p != mp)
                break;
            mp *= p;
        }
        if (u64(p) * p >= end) {
            ASSERT_MSG(p < small_primes.back(),
                "To few small primes! Increase Eratosthenes sieving range.");
            break;
        }
    }
    
    for (u64 i = begin; i < end; ++i)
        if (filter[i * 2 + 1 - fbegin]) {
            std::sort(lo_factors[i - begin].begin(), lo_factors[i - begin].end());
            lo_factors[i - begin].shrink_to_fit();
        }
    
    for (u64 i = begin; i < end; ++i)
        if (filter[i * 2 + 1 - fbegin] && prods[i - begin] < i) {
            u64 const f = i / prods[i - begin];
            ASSERT(f * prods[i - begin] == i);
            hi_factors[i - begin] = f;
        }
    
    return std::make_pair(std::move(lo_factors), std::move(hi_factors));
}

static std::atomic<u64> cnt_outputted_ranges = small_irange_start;
static std::mutex output_mutex;

static void ProcessRange(size_t const base_bits, size_t const range_bits,
        u64 const irange, std::vector<u32> const & small_primes) {
    u64 const
        begin = (1ULL << base_bits) + (irange + 0) * (1ULL << range_bits),
        end   = (1ULL << base_bits) + (irange + 1) * (1ULL << range_bits);
    
    auto time_primes = TimeSec();
    auto const ps = PrimeSieve(begin, end, small_primes);
    time_primes = TimeSec() - time_primes;
    
    std::vector<bool> filter(end - begin);
    for (auto p: ps)
        filter.at(p - begin) = true;
    filter.shrink_to_fit();
    u64 const hbegin = begin / 2,
        hend = ((end - 1) / 2 * 2 + 1 > end - 1 ? (end - 2) / 2 : (end - 1) / 2) + 1;
    
    auto time_factor = TimeSec();
    auto const [flo, fhi] = FactorSieve(hbegin, hend, small_primes, begin, filter);
    time_factor = TimeSec() - time_factor;
    
    u64 mem_used = ps.capacity() * sizeof(ps[0]) + filter.capacity() * 1 +
        flo.capacity() * sizeof(flo[0]) + fhi.capacity() * sizeof(fhi[0]);
    for (u64 i = 0; i < flo.size(); ++i)
        mem_used += flo[i].capacity() * sizeof(flo[i][0]);
    
    std::ostringstream file;
    
    u64 hprimes_cnt = 0;
    for (u64 i = begin; i < end; ++i) {
        if (!filter.at(i - begin))
            continue; 
        u64 const q = (i - 1) / 2;
        file << "Range [2^" << base_bits << " + 2^" << range_bits << " * " << irange
            << ", 2^" << base_bits << " + 2^" << range_bits << " * " << (irange + 1)
            << "), prime p = " << i << ", factors of (p - 1) / 2 = " << q << ": ";
        u64 prod = 1;
        auto const & v = flo.at(q - hbegin);
        auto const & h = fhi.at(q - hbegin);
        size_t c = 0;
        for (size_t j = 0; j < v.size(); ++j) {
            ++c;
            prod *= v[j];
            file << v[j] << (j + 1 >= v.size() && !h ? "" : ", ");
        }
        if (h) {
            ++c;
            prod *= *h;
            file << *h;
        }
        file << std::endl;
        if (c == 1)
            ++hprimes_cnt;
        ASSERT_MSG(prod == q, "For p = " + std::to_string(i) + " error in factoring (p - 1) / 2 = " +
                std::to_string(q) + ", product of factors " + std::to_string(prod) + ", factors: " + [&]{
            std::ostringstream ss;
            for (auto e: v)
                ss << e << ", ";
            if (h)
                ss << *h;
            return ss.str();
        }());
        ASSERT(c > 0);
    }
    
    while (true) {
        std::unique_lock<std::mutex> file_lock(output_mutex);
        
        if (irange > cnt_outputted_ranges) {
            file_lock.unlock();
            std::this_thread::yield();
        } else {
            ASSERT_MSG(irange == cnt_outputted_ranges, "irange " + std::to_string(irange) +
                ", cnt_outputted_ranges " + std::to_string(cnt_outputted_ranges));
            std::ofstream ofile(out_fname, std::ios::app);
            ASSERT_MSG(ofile.is_open(), "Can't open file '" + out_fname + "' for write-append!");
            //ofile.rdbuf()->pubsetbuf(nullptr, out_file_buf_size);
            ofile << file.str();
            ofile.close();
            ASSERT_MSG(ofile.good(), "Error closing file '" + out_fname + "'!");
            cnt_outputted_ranges = irange + 1;
            break;
        }
    }
    
    COUT(<< "Processed range [2^" << base_bits << " + 2^" << range_bits << " * " << std::setw(3) << irange
        << ", 2^" << base_bits << " + 2^" << range_bits << " * " << std::setw(3) << (irange + 1)
        << "), prime sieve time " << std::fixed << std::setprecision(1) << std::setw(4) << time_primes
        << " sec, count primes " << ps.size() << ", half factor sieve time " << std::setw(4) << time_factor
        << " sec, count half primes (Sophie Germain) " << hprimes_cnt << ", range mem usage "
        << 1.0 * mem_used / (1 << 20) << " MiB" << std::endl);
}

static void Main() {
    COUT(<< "Creating table of all 32-bit primes! Wait around 1-2 minutes..." << std::endl);
    auto const small_primes = GenPrimesSieve_Eratosthenes<u32>(u32(-1));
    // http://www.umopit.ru/CompLab/primes32eng.htm
    // In total there are 203280221 primes that are 32-bit in size.
    ASSERT_MSG(small_primes.size() == 203280221,
        "small_primes.size() " + std::to_string(small_primes.size()));
    COUT(<< "Small primes count " << small_primes.size()
        << " and mem usage " << std::fixed << std::setprecision(1)
        << 1.0 * small_primes.capacity() * sizeof(small_primes[0]) / (1 << 20)
        << " MiB" << std::endl);
    // Erase file if starting from 0 range
    if (small_irange_start == 0) {
        COUT(<< "Erasing output file!" << std::endl);
        std::ofstream out_cleanup(out_fname);
        ASSERT(out_cleanup.is_open());
    }
    COUT(<< "Starting ranges processing from range index " << small_irange_start << std::endl);
    
    std::vector<std::future<void>> asyncs;
    for (u64 irange = small_irange_start;; ++irange) {
        while (asyncs.size() >= NThreads()) {
            for (size_t j = 0; j < asyncs.size(); ++j)
                if (asyncs[j].wait_for(std::chrono::microseconds(1)) == std::future_status::ready) {
                    asyncs[j].get();
                    asyncs.erase(asyncs.begin() + j);
                    break;
                }
            std::this_thread::yield();
        }
        asyncs.push_back(std::async(std::launch::async, [&, irange]{
            ProcessRange(base_range_bits, small_range_bits, irange, small_primes);
        }));
    }
    
    COUT(<< ": All done!" << std::endl);
}

int main() {
    try {
        Main();
        return 0;
    } catch (std::exception const & ex) {
        COUT(<< "Exception: " << ex.what() << std::endl);
        return -1;
    }
}

Console output:

2024.03.22 12:23:12.300: Creating table of all 32-bit primes! Wait around 1-2 minutes...
2024.03.22 12:27:09.031: Small primes count 203280221 and mem usage 775.5 MiB
2024.03.22 12:27:09.071: Erasing output file!
2024.03.22 12:27:09.076: Starting ranges processing from range index 0
2024.03.22 12:27:42.648: Processed range [2^63 + 2^24 *   0, 2^63 + 2^24 *   1),
    prime sieve time  6.4 sec, count primes 192022, half factor sieve time 13.4 sec,
    count half primes (Sophie Germain) 5871, range mem usage 340.0 MiB
2024.03.22 12:27:44.486: Processed range [2^63 + 2^24 *   1, 2^63 + 2^24 *   2),
    prime sieve time  6.5 sec, count primes 192350, half factor sieve time 13.1 sec,
    count half primes (Sophie Germain) 6014, range mem usage 340.0 MiB

File output.txt:

Range [2^63 + 2^24 * 0, 2^63 + 2^24 * 1), prime p = 9223372036854775907, factors of
    (p - 1) / 2 = 4611686018427387953: 113, 149, 273901883852669
Range [2^63 + 2^24 * 0, 2^63 + 2^24 * 1), prime p = 9223372036854775931, factors of
    (p - 1) / 2 = 4611686018427387965: 5, 13, 13, 13, 4073, 103073081453
Range [2^63 + 2^24 * 0, 2^63 + 2^24 * 1), prime p = 9223372036854775939, factors of
    (p - 1) / 2 = 4611686018427387969: 3, 43, 2389, 16823, 889509163
Range [2^63 + 2^24 * 0, 2^63 + 2^24 * 1), prime p = 9223372036854775963, factors of
    (p - 1) / 2 = 4611686018427387981: 3, 3, 15139, 33846988414231

Fastest way to Factor (Prime-1)/2 for 64-bit Prime?

Tags:

c++

python

prime-factoring

factoring

Extrarius

2 Answers

user448810

Arty

Recent Activity

Donate For Us

Fastest way to Factor (Prime-1)/2 for 64-bit Prime?

Tags:

c++

python

prime-factoring

factoring

Extrarius

2 Answers

user448810

Arty

Related questions

Recent Activity

Donate For Us