Generating Prime numbers is a toy problem that I often attempt from time to time, especially when experimenting with a new programming language, platform or style. I was thinking of attempting to write a Prime Number Generation algorithm or a Prime Number Test Algorithm using Hadoop (Map Reduce). I thought I'd post this question to get tips, references, to algorithms, approaches. Although my primary interest is a Map Reduce based algorithm I wouldn't mind looking at new Hadoop programming models or for example looking at using PiCloud I have seems some interesting questions here on Prime Number Generation: here, here and here, but nothing related to a Parallel approach caught my eye. Thanks in advance.

Here's an algorithm that is built on mapping and reducing (folding). It expresses the sieve of Eratosthenes P = {3,5,7, ...} \ U {{p2, p2+2p, p2+4p, ...} | p in P} for the odd primes (i.e without the 2). The folding tree is indefinitely deepening to the right, like this: <img src="https://i.stack.imgur.com/LZdjL.gif" alt="enter image description here"> where each prime number marks a stream of odd multiples of that prime, e.g. for 7: 49, 49+14, 49+28, ... , which are all merged to get all the composite numbers, and then primes are found in the gaps between the composite numbers. It is in Haskell, so the timing is taken care of implicitly by the lazy evaluation mechanism (and the algorithmic adjustment where each comparing node always lets through the first number from the left without demanding a number from the right, because it is guaranteed to be bigger anyway). Odds can be used instead of odd primes as seeds for multiples generation, to simplify things (with obvious performance implications). The work can naturally be divided into segments between consecutive primes' squares. Haskell code follows, but we can regard it as an executable pseudocode too (where <code>:</code> is a list node lazy constructor, a function call <code>f(x)</code> is written <code>f x</code>, and parentheses are used for grouping only): <pre class="prettyprint lang-hs prettyprint-override"><code>primes = 2 : g [] where g ps = 3 : minus [5,7..] (_U [[p*p, p*p+2*p..] | p <- g ps]) _U ((x:xs):t) = x : union xs (_U (pairs t)) pairs ((x:xs):ys:t) = (x : union xs ys) : pairs t union (x:xs) (y:ys) = case compare x y of LT -> x : union xs (y:ys) EQ -> x : union xs ys GT -> y : union (x:xs) ys minus (x:xs) (y:ys) = case compare x y of LT -> x : minus xs (y:ys) EQ -> minus xs ys GT -> minus (x:xs) ys </code></pre> A discussion is here. More sophisticated, lazier scheduling is here. Also this SO answer shows approximate translation of (related) Haskell code in terms of generators; this one in Python.

Parallel Algorithms for Generating Prime Numbers (possibly using Hadoop's map reduce)

1 Answers

Here's an algorithm that is built on mapping and reducing (folding). It expresses the sieve of Eratosthenes

P = {3,5,7, ...} \ U {{p², p²+2p, p²+4p, ...} | p in P}

for the odd primes (i.e without the 2). The folding tree is indefinitely deepening to the right, like this:

enter image description here

where each prime number marks a stream of odd multiples of that prime, e.g. for 7: 49, 49+14, 49+28, ... , which are all merged to get all the composite numbers, and then primes are found in the gaps between the composite numbers. It is in Haskell, so the timing is taken care of implicitly by the lazy evaluation mechanism _{(and the algorithmic adjustment where each comparing node always lets through the first number from the left without demanding a number from the right, because it is guaranteed to be bigger anyway)}.

Odds can be used instead of odd primes as seeds for multiples generation, to simplify things (with obvious performance implications).

The work can naturally be divided into segments between consecutive primes' squares. Haskell code follows, but we can regard it as an executable pseudocode too _{(where : is a list node lazy constructor, a function call f(x) is written f x, and parentheses are used for grouping only)}:

primes = 2 : g []
  where
    g ps = 3 : minus [5,7..] (_U [[p*p, p*p+2*p..] | p <- g ps])
    _U ((x:xs):t) = x : union xs (_U (pairs t))
    pairs ((x:xs):ys:t) = (x : union xs ys) : pairs t

union (x:xs) (y:ys) = case compare x y of 
    LT -> x : union  xs (y:ys) 
    EQ -> x : union  xs    ys 
    GT -> y : union (x:xs) ys

minus (x:xs) (y:ys) = case compare x y of
    LT -> x : minus  xs (y:ys) 
    EQ ->     minus  xs    ys 
    GT ->     minus (x:xs) ys

A discussion is here. More sophisticated, lazier scheduling is here. Also this SO answer shows approximate translation of (related) Haskell code in terms of generators; this one in Python.

answered Sep 29 '22 19:09

Will Ness

Related questions
                            
                                What is the difference between single node & pseudo-distributed mode in Hadoop?
                            
                                How to open/stream .zip files through Spark?
                            
                                How to output multiple s3 files in Parquet
                            
                                Unable to load native hadoop library for Mac OS X
                            
                                Define tuple datas in the pig script
                            
                                How do I submit more than one job to Hadoop in a step using the Elastic MapReduce API?
                            
                                Using Hadoop for Parallel Processing rather than Big Data
                            
                                Filtering null values with pig
                            
                                What is the meaning of 'serialization.format' property of a table in hive
                            
                                How to unzip file in hadoop?
                            
                                Hive service, HiveServer2 & MetaStore service?
                            
                                Hadoop Map Reduce: Algorithms
                            
                                Hadoop and MySQL Integration
                            
                                .NET and Hadoop - What should I know / learn and what is available? [closed]
                            
                                Is there any way to download a HDFS file using WebHDFS REST API? [closed]
                            
                                How to write pyspark dataframe to HDFS and then how to read it back into dataframe?
                            
                                How to avoid OutOfMemoryException when running Hadoop?
                            
                                Installing Hbase / Hadoop on EC2 cluster
                            
                                Apache Spark EOF exception
                            
                                What is difference between Oozie workflow, coordinator and bundle

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Parallel Algorithms for Generating Prime Numbers (possibly using Hadoop's map reduce)

Tags:

parallel-processing

primes

hadoop

mpi

number-theory

user1172468

People also ask

1 Answers

Will Ness

Recent Activity

Donate For Us