Is there an (efficient) iterator to generate prime numbers in Julia? The inbuilt function primes[N]
generates all primes up to N
at once, rather than as required, and may not be usable when N
is very large, or unknown.
You can filter a counter going through the (big) ints (the Base.Count{BigInt}
iterator) using the probabilistic primality test
iterprimes = filter(isprime,countfrom(big(2),1))
Then for example
julia> collect(take(iterprimes, 5))
5-element Array{Any,1}:
2
3
5
7
11
This is not so effective in total as a sieve but does not keep a huge structure in memory. I recall that isprime
has at least no false positives up to 2^64 with the standard choice of repetitions.
Edit:
A second possibility is to generate (see Generator
) chunks of primes(N*(i-1)+1,N*i)
and Base.flatten
them into a single list:
Base.flatten(primes(1000000*(i-1)+1,1000000*i) for i in countfrom(1))
On this machine this iterator actually beats plain primes
for computing the first 10^9 primes.
Edit 2:
An Iterator using gmpz
's nextprime
.
type
PrimeIter
end
function nextprime(y::BigInt)
x = BigInt()
ccall((:__gmpz_nextprime,:libgmp), Void, (Ptr{BigInt},Ptr{BigInt}), &x, &y)
x
end
Base.start(::PrimeIter) = big(2)
Base.next(::PrimeIter, state) = state, nextprime(state)
Base.done(::PrimeIter, _) = false
Base.iteratorsize(::PrimeIter) = Base.IsInfinite()
> first(drop(PrimeIter(), 10^5))
1299721
You can check out Lazy.jl, which gives you prime iteration on demand. It works for an unknown large number. The assumption is that you want to use all prime numbers lesser than an upper bound, and have the space to store them.
Quote from their readme:-
# isprime defined in terms of the prime numbers:
isprime(n) =
@>> primes begin
takewhile(x -> x<=sqrt(n))
map(x -> n % x == 0)
any; !
end
# the prime numbers defined in terms of isprime:
primes = filter(isprime, range(2));
take(20, primes)
#> (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71)
To explain the code, firstly isprime
function is defined using the list of all primes primes
(which haven't been defined yet at that point in time), by taking all primes lesser than the square root of n
, check whether any of them divides n
, and negate the result logically.
Then prime
is defined as a filter
of isprime
over all integers from 2 onward.
To get all prime numbers below n
, you can just run @>> primes takewhile(p -> p <= n)
instead of take
.
You don't say what you consider a reasonable range for your iterator or how long you would like to process it. Such algorithms generally come in two flavors, as in: A) short but slower (a range of a million in the order of a second), and B) more complex but much faster (about a hundred times faster than A). Following are examples of each.
A) a version of an iterator based on the built-in "Primes" package (pkg add "Primes") and specifically the nextprime
function:
using Primes: nextprime
mutable struct PrimesGen
lastprime :: UInt64
PrimesGen() = new()
end
Base.eltype(::Type{PrimesGen}) = Int64
Base.IteratorSize(::PrimesGen) = Base.IsInfinite()
function Base.iterate(PG::PrimesGen, st::UInt64 = UInt64(1)) # :: Union{Nothing,Tuple{UInt64,UInt64}}
next = nextprime(st + 1)
next, next
end
EDIT_ADD: the above code is a slightly faster solution than @mschauer's first solution (updated to current Julia version 1.0) due to his multiple nested iterators, as follows:
using Primes: isprime
PrimesGen() = Iterators.filter(isprime, Iterators.countfrom(UInt64(2)))
but it's short and can be used the same way... END_EDIT_ADD
with which you can do something like the following:
using Printf
@time let sm = 0
for p in PrimesGen() p >= 2_000_000 && break; sm += p end
Printf.@printf("%d\n",sm)
end
to produce the following:
142913828922 0.651754 seconds (327.05 k allocations: 4.990 MiB)
This is adequate to use for these smaller ranges such as solving Euler Problem 10 above (run on an Intel x5-Z8350 at 1.92 Gigahertz).
The above iterator actually has an "infinite" range of the UInt64
number but that won't be reached for over 300 thousand years so we don't really need to worry about it...
B) For "industrial strength" problems involving ranges of a billion or more, one needs an iterator of (or direct function calls into) an implementation of the page segmented Sieve of Eratosthenes at about a hundred times faster, with an implementation as follows:
const Prime = UInt64
const BasePrime = UInt32
const BasePrimesArray = Array{BasePrime,1}
const SieveBuffer = Array{UInt8,1}
# contains a lazy list of a secondary base primes arrays feed
# NOT thread safe; needs a Mutex gate to make it so...
abstract type BPAS end # stands in for BasePrimesArrays, not defined yet
mutable struct BasePrimesArrays <: BPAS
thunk :: Union{Nothing,Function} # problem with efficiency - untyped function!!!!!!!!!
value :: Union{Nothing,Tuple{BasePrimesArray, BPAS}}
BasePrimesArrays(thunk::Function) = new(thunk)
end
Base.eltype(::Type{BasePrimesArrays}) = BasePrime
Base.IteratorSize(::Type{BasePrimesArrays}) = Base.SizeUnknown() # "infinite"...
function Base.iterate(BPAs::BasePrimesArrays, state::BasePrimesArrays = BPAs)
if state.thunk !== nothing
newvalue :: Union{Nothing,Tuple{BasePrimesArray, BasePrimesArrays}} =
state.thunk() :: Union{Nothing,Tuple{BasePrimesArray
, BasePrimesArrays}}
state.value = newvalue
state.thunk = nothing
return newvalue
end
state.value
end
# count the number of zero bits (primes) in a byte array,
# also works for part arrays/slices, best used as an `@view`...
function countComposites(cmpsts::AbstractArray{UInt8,1})
foldl((a, b) -> a + count_zeros(b), cmpsts; init = 0)
end
# converts an entire sieved array of bytes into an array of UInt32 primes,
# to be used as a source of base primes...
function composites2BasePrimesArray(low::Prime, cmpsts::SieveBuffer)
limiti = length(cmpsts) * 8
len :: Int = countComposites(cmpsts)
rslt :: BasePrimesArray = BasePrimesArray(undef, len)
i :: Int = 0
j :: Int = 1
@inbounds(
while i < limiti
if cmpsts[i >>> 3 + 1] & (1 << (i & 7)) == 0
rslt[j] = low + i + i
j += 1
end
i += 1
end)
rslt
end
# sieving work done, based on low starting value for the given buffer and
# the given lazy list of base prime arrays...
function sieveComposites(low::Prime, buffer::Array{UInt8,1},
bpas::BasePrimesArrays)
lowi :: Int = (low - 3) ÷ 2
len :: Int = length(buffer)
limiti :: Int = len * 8 - 1
nexti :: Int = lowi + limiti
for bpa::BasePrimesArray in bpas
for bp::BasePrime in bpa
bpint :: Int = bp
bpi :: Int = (bpint - 3) >>> 1
starti :: Int = 2 * bpi * (bpi + 3) + 3
starti >= nexti && return
if starti >= lowi starti -= lowi
else
r :: Int = (lowi - starti) % bpint
starti = r == 0 ? 0 : bpint - r
end
lmti :: Int = limiti - 40 * bpint
@inbounds(
if bpint <= (len >>> 2) starti <= lmti
for i in 1:8
if starti > limiti break end
mask = convert(UInt8,1) << (starti & 7)
c = starti >>> 3 + 1
while c <= len
buffer[c] |= mask
c += bpint
end
starti += bpint
end
else
c = starti
while c <= limiti
buffer[c >>> 3 + 1] |= convert(UInt8,1) << (c & 7)
c += bpint
end
end)
end
end
return
end
# starts the secondary base primes feed with minimum size in bits set to 4K...
# thus, for the first buffer primes up to 8293,
# the seeded primes easily cover it as 97 squared is 9409.
function makeBasePrimesArrays() :: BasePrimesArrays
cmpsts :: SieveBuffer = Array{UInt8,1}(undef, 512)
function nextelem(low::Prime, bpas::BasePrimesArrays) ::
Tuple{BasePrimesArray, BasePrimesArrays}
# calculate size so that the bit span is at least as big as the
# maximum culling prime required, rounded up to minsizebits blocks...
reqdsize :: Int = 2 + isqrt(1 + low)
size :: Int = (reqdsize ÷ 4096 + 1) * 4096 ÷ 8 # size in bytes
if size > length(cmpsts) cmpsts = Array{UInt8,1}(undef, size) end
fill!(cmpsts, 0)
sieveComposites(low, cmpsts, bpas)
arr :: BasePrimesArray = composites2BasePrimesArray(low, cmpsts)
next :: Prime = low + length(cmpsts) * 8 * 2
arr, BasePrimesArrays(() -> nextelem(next, bpas))
end
# pre-seeding breaks recursive race,
# as only known base primes used for first page...
preseedarr :: BasePrimesArray = # pre-seed to 100, can sieve to 10,000...
[ 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41
, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97
]
nextfunc :: Function = () ->
(nextelem(convert(Prime,101), makeBasePrimesArrays()))
firstfunc :: Function = () -> (preseedarr, BasePrimesArrays(nextfunc))
BasePrimesArrays(firstfunc)
end
# an iterator over successive sieved buffer composite arrays,
# returning a tuple of the value represented by the lowest possible prime
# in the sieved composites array and the array itself;
# array has a 16 Kilobytes minimum size (CPU L1 cache), but
# will grow so that the bit span is larger than the
# maximum culling base prime required, possibly making it larger than
# the L1 cache for large ranges, but still reasonably efficient using
# the L2 cache: very efficient up to about 16e9 range;
# reasonably efficient to about 2.56e14 for two Megabyte L2 cache = > 1 week...
struct PrimesPages
baseprimes :: BasePrimesArrays
PrimesPages() = new(makeBasePrimesArrays())
end
Base.eltype(::Type{PrimesPages}) = SieveBuffer
Base.IteratorSize(::Type{PrimesPages}) = Base.IsInfinite()
function Base.iterate(PP::PrimesPages,
state :: Tuple{Prime,SieveBuffer} =
( convert(Prime,3), Array{UInt8,1}(undef,16384) ))
(low, cmpsts) = state
# calculate size so that the bit span is at least as big as the
# maximum culling prime required, rounded up to minsizebits blocks...
reqdsize :: Int = 2 + isqrt(1 + low)
size :: Int = (reqdsize ÷ 131072 + 1) * 131072 ÷ 8 # size in bytes
if size > length(cmpsts) cmpsts = Array{UInt8,1}(undef, size) end
fill!(cmpsts, 0)
sieveComposites(low, cmpsts, PP.baseprimes)
newlow :: Prime = low + length(cmpsts) * 8 * 2
( low, cmpsts ), ( newlow, cmpsts )
end
function countPrimesTo(range::Prime) :: Int64
range < 3 && ((range < 2 && return 0) || return 1)
count :: Int64 = 1
for ( low, cmpsts ) in PrimesPages() # almost never exits!!!
if low + length(cmpsts) * 8 * 2 > range
lasti :: Int = (range - low) ÷ 2
count += countComposites(@view cmpsts[1:lasti >>> 3])
count += count_zeros(cmpsts[lasti >>> 3 + 1] |
(0xFE << (lasti & 7)))
return count
end
count += countComposites(cmpsts)
end
count
end
which can be called as follows:
using Printf
@time let sm = 0
for p in PrimesPaged() p >= 2_000_000 && break; sm += p end
Printf.@printf("%d\n",sm)
end
to produce the following:
142913828922 0.016245 seconds (60 allocations: 23.891 KiB)
but than is barely enough to "warm it up"; it can be called with an argument a hundred times as large to produce the following:
1075207199997334 1.381198 seconds (2.35 k allocations: 103.875 KiB)
and can count all the primes to a billion with the following code:
println(@time let count = 0
for p in PrimesPaged()
p > 1_000_000_000 && break
count += 1
end; count end)
to produce the following:
6.802044 seconds (11.51 k allocations: 396.734 KiB) 50847534
However, it takes much longer to iterate over the sieved primes than it does to sieve them in the first place. This can be shown by calling the provided optimized counting function to eliminate most of the enumeration time as follows:
println(@time countPrimesTo(Prime(1_000_000_000)))
to produce the following:
1.959057 seconds (65 allocations: 39.266 KiB) 50847534
In a similar fashion, one could write a sumPrimesTo
function to sum the sieved primes to a billion in just a little more time (say twice)...
All tests here are run on the same x5-Z8350 CPU at 1.92 Gigahertz.
This shows that for truly huge problems, one shouldn't use an iterator and should rather use custom functions that operate directly on the culled page segments as the countPrimesTo
function does here. When this is done, it would be worth making further optimizations such as maximum wheel factorization (far a gain in a further four times in speed for the sieving) and multi-threading (for a gain of the number of effective CPU cures used (not including Hyper Threading ones) with the end result not being much slower if any than Kim Walich's "primesieve".
Again, this has the same UInt64
"infinite" limit, but it still will take hundreds of years to get there so we still don't have to worry about it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With