Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java implementation of Sieve of Eratosthenes that can go past n = 2^32?

Currently I have this prime generator that is limited to n < 2^32-1. I'm not entirely sure how I could expand the limit further, given the limit of elements in an array.

Sieve:

public class Main {

public static void main(String args[]){
    long N = 2000000000;

    // initially assume all integers are prime

    boolean[] isPrime = new boolean[N + 1];
    for (int i = 2; i <= N; i++) {
        isPrime[i] = true;
    }

    // mark non-primes <= N using Sieve of Eratosthenes
    for (int i = 2; i*i <= N; i++) {

        // if i is prime, then mark multiples of i as nonprime
        // suffices to consider mutiples i, i+1, ..., N/i
        if (isPrime[i]) {
            for (int j = i; i*j <= N; j++) {
                isPrime[i*j] = false;
            }
        }
    }
}
}

How could I modify this to go past n = 2^32-1?

like image 274
Arman Avatar asked Sep 21 '15 04:09

Arman


3 Answers

You may use an array of BitSet objects to represent long bit-set. Here's the complete example:

public class Main {
    private static class LongBitSet {
        // max value stored in single BitSet
        private static final int BITSET_SIZE = 1 << 30;

        BitSet[] bitsets;

        public LongBitSet(long limit) {
            bitsets = new BitSet[(int) (limit/BITSET_SIZE+1)];
            // set all bits by default
            for(int i=0; i<bitsets.length; i++) {
                bitsets[i] = new BitSet();
                int max = (int) (i == bitsets.length-1 ?
                          limit % BITSET_SIZE : BITSET_SIZE);
                bitsets[i].set(0, max);
            }
        }

        // clear specified bit
        public void clear(long pos) {
            bitsets[(int) (pos / BITSET_SIZE)].clear((int) (pos % BITSET_SIZE));
        }

        // get the value of the specified bit
        public boolean get(long pos) {
            return bitsets[(int) (pos / BITSET_SIZE)].get((int) (pos % BITSET_SIZE));
        }

        // get the number of set bits
        public long cardinality() {
            long cardinality = 0;
            for(BitSet bs : bitsets) {
                cardinality += bs.cardinality();
            }
            return cardinality;
        }
    }

    public static void main(String args[]) {
        long N = 4000000000L;

        // initially assume all integers are prime

        LongBitSet bs = new LongBitSet(N+1);
        // clear 0 and 1: non-primes
        bs.clear(0);
        bs.clear(1);

        // mark non-primes <= N using Sieve of Eratosthenes
        for (long i = 2; i * i <= N; i++) {
            if (bs.get(i)) {
                for (long j = i; i * j <= N; j++) {
                    bs.clear(i * j);
                }
            }
        }
        System.out.println(bs.cardinality());
    }
}

This program for N = 4_000_000_000L takes around 512Mb of memory, works couple of minutes and prints 189961812 which is correct number of primes below 4 billions according to Wolfram Alpha. If you have enough RAM, you may try setting even bigger N.

like image 56
Tagir Valeev Avatar answered Oct 22 '22 17:10

Tagir Valeev


You can segment the sieve: instead of allocating a single, gigantic array you allocate many small arrays. If you want to find the primes up to 10^10 you might use arrays (or better yet, BitSets) of size 10^6 or so. Then you run the sieve 10^4 times. Each time you run a new segment you need to find out where to start each prime in the sieve, but that's not really too hard.

In addition to allowing much smaller memory use this keeps more of the memory in the cache, so it's often significantly faster.

like image 24
Charles Avatar answered Oct 22 '22 16:10

Charles


I see options:

  1. pack 16 numbers / 1 BYTE

    • remember odd numbers only per each bit
    • use unsigned variable to avoid sign bit waste
  2. use more than 1 table

    • but in 32bit app you are limited by OS capabilities
    • usually to 1/2/4 GB of usable memory
    • such big table usually does not fit inside CACHE so it is not very fast anyway
  3. you can use more approaches at once

    • I combine periodic sieves with found prime list binary search
    • If you choose the sizes right you can even improve performance
    • by better using platform Caching properties
    • see Prime numbers by Eratosthenes quicker sequential than concurrently?
    • the idea is to use sieves to test small divisors
    • and only then check the presence inside prime list
    • it is not requiring that much memory
    • and is pretty fast
  4. to spare memory you can combine 16/32/64 bit variables

    • use small bit width while you can
    • so have prime list divided to 3 groups: small/medium/big
    • if you want also bigints add them as 4th list
like image 20
Spektre Avatar answered Oct 22 '22 16:10

Spektre