Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient storage of prime numbers

Tags:

For a library, I need to store the first primes numbers up to a limit L. This collection must have a O(1) lookup time (to check whether a number is prime or not) and it must be easy, given a number, to find the next prime number (assuming it is smaller than L).

Given that L is fixed, an Eratostene sieve to generate the list is fine. Right now, I use a packed boolean array to store the list, which contains only entries for odd numbers between 3 and L (inclusive). This takes (L-2)/2 bits of memory. I would like to be able to statically increase L without using more memory.

Is there a data structure using less memory with similar properties? Or with at least the constant lookup time? (odd numbers can then be enumerated until we get a prime)

(the language I wrote this in is Factor but this question would be the same in any language which has built-in or easily programmable packed bit arrays)

like image 597
Samuel Tardieu Avatar asked Jun 23 '09 12:06

Samuel Tardieu


3 Answers

You can explicitly check more prime numbers to remove redundancy.

At the moment you do this only for two, by checking for divisibility by two explicitly and then storing only for odd numbers whether they are prime.

For 2 and 3 you get remainders 0 to 5, of which only 1 and 5 are not divisible by two or three and can lead to a prime number, so you are down to 1/3.

For 2, 3, and 5 you get 8 numbers out of 30, which is nice to store in a byte.

This is explained in more detail here.

like image 124
starblue Avatar answered Sep 24 '22 08:09

starblue


An alternative to packed bitmaps and wheels - but equally efficient in certain contexts - is storing the differences between consecutive primes. If you leave out the number 2 as usual then all differences are even. Storing difference/2 you can get up to 2^40ish regions (just before 1999066711391) using byte-sized variables.

The primes up 2^32 require only 194 MByte, compared to 256 MByte for an odds-only packed bitmap. Iterating over delta-stored primes is much faster than for wheeled storage, which includes the modulo-2 wheel known as odds-only bitmap.

For ranges from 1999066711391 onwards, bigger cell size or variable-length storage are needed. The latter can be extremely efficient even if very simple schemes are used (e.g. keep adding until a byte < 255 has been added, like in LZ4-style compression), because of the extremely low frequency of gaps longer than 510/2.

For efficiency's sake it is best to divide the range into sections (pages) and manage them B-Tree style.

Entropy-coding the differences (Huffmann or arithmetic coding) cuts permanent storage requirements to a bit less than half, which is close to the theoretical optimum and better than lists or wheels compressed using the best available packers.

If the data is stored uncompressed then it is still much more compact than files of binary or textual numbers, by an order of magnitude or more. With a B-Tree style index in place it is easy to simply map sections into memory as needed and iterate over them at blazing speed.

like image 30
DarthGizka Avatar answered Sep 26 '22 08:09

DarthGizka


At the moment you are treating 2 as special case and then having an array where every odd number is mapped to an element in the array (with some odd numbers being prime). You could improve on this by treating 2 and 3 as special cases recognising that the rest of the prime numbers are in the form 6n+1 or 6n-1 (that is for all primes p where p > 3, p mod 6 = 1 or 5). This can be further generalised - see Wikipedia. For all primes p > 5, p mod 30 = 1, 7, 11, 13, 17, 19, 23 or 29. You could keep going with this and reduce the memory needed at the expense of processing time (although it will still be O(1), just a slower O(1)).

like image 4
David Johnstone Avatar answered Sep 25 '22 08:09

David Johnstone