Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C - Sieve of Eratosthenes - BitField

I'm about to implement the Sieve of Eratosthenes and have a general question regarding the sieve-array.

I've implemented the sieve quite a few times now (in C) and always used an array of uint8_t (out of <stdint.h>) as the sieve. This is pretty memory inefficient, since 8 bits are used for each number to sieve, even though one bit should be sufficient.

How would I go about this in C? I need an array of bits. I could pretty much create an array of any type (uint8_t, uint16_t, uint32_t, uint64_t) and access the single bits with bit masks, etc.

What data type should I prefer and what operations should I use to access the bits without performance loss?

PS: I don't think this is a duplicate of just a BitArray implementation, since it the question is specific about the Sieve of Eratosthenes, since it's main nature needs to be efficient (not only in memory usage, but in access). I was thinking, that maybe different tricks can be used to make the sieving process more efficient...

like image 247
Matthias Avatar asked Jun 27 '16 17:06

Matthias


People also ask

What is the sieve of Eratosthenes method?

Sieve of Eratosthenes is a method to find the prime numbers and composite numbers among a group of numbers. This method was introduced by Greek Mathematician Eratosthenes in the third century B.C.

What is the time complexity of sieve of Eratosthenes?

The sieve of Eratosthenes is a popular way to benchmark computer performance. The time complexity of calculating all primes below n in the random access machine model is O(n log log n) operations, a direct consequence of the fact that the prime harmonic series asymptotically approaches log log n.

What is sieve of Eratosthenes in Python?

Sieve of Eratosthenes is a method for finding all primes up to (and possibly including) a given natural. This method works well when is relatively small, allowing us to determine whether any natural number less than or equal to is prime or composite.


2 Answers

As mentioned by Weather Vane in his comments, you can save additional space by only considering every other number, since all even number except 2 are non-prime.

So in your bit array, each bit represents an odd number.

Here's an implementation I did a few years back using this technique.

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <time.h>
#include <math.h>
#include <stdint.h>

uint8_t *num;
int count = 0;
FILE *primefile;

int main(int argc, char *argv[])
{
  int i,j,root;
  time_t t;

  if (argc>1) count=atoi(argv[1]);
  if (count < 100) {
    fprintf(stderr,"Invalid number\n");
    exit(1);
  }
  if ((num=calloc(count/16,1))==NULL) {
    perror("calloc failed");
    exit(1);
  }
  if ((primefile=fopen("primes.dat","w"))==NULL) {
    perror("Coundn't open primes.dat");
    exit(1);
  }
  t=time(NULL);
  printf("Start:\t%s",ctime(&t));
  root=floor(sqrt(count));
  // write 2 to the output file
  i=2;
  if (fwrite(&i,sizeof(i),1,primefile)==0) {
    perror("Couldn't write to primes.dat");
  }
  // process larger numbers
  for (i=3;i<count;i+=2) {
    if ((num[i>>4] & (1<<((i>>1)&7)))!=0) continue;
    if (fwrite(&i,sizeof(i),1,primefile)==0) {
      perror("Couldn't write to primes.dat");
    }
    if (i<root) {
      for (j=3*i;j<count;j+=2*i) {
        num[j>>4]|=(1<<((j>>1)&7));
      }
    }
  }
  t=time(NULL);
  printf("End:\t%s",ctime(&t));
  fclose(primefile);
  return 0;
}

Here, num is the bit array, allocated dynamically based on the upper bound of your search. So if you were looking for all primes up to 1000000000 (1 billion), it uses 64000000 (64 million) bytes of memory.

The key expressions are as follows:

For a "normal" bit array:

Set bit i:

num[i>>3] |= (1<<(i&7);
// same as num[i/8] |= (1<<((i%8));

Check bit i:

(num[i>>3] & (1<<(i&7))) != 0
// same as (num[i/8] & (1<<(i%8))) != 0

Since we're only keeping track of every other number, we divide i by 2 (or equivalently, shift right by 1:

num[i>>4] |= (1<<((i>>1)&7);
// same as num[(i/2)/8] |= (1<<(((i/2)%8));

(num[i>>4] & (1<<((i>>1)&7))) != 0
// same as (num[(i/2)/8] & (1<<((i/2)%8))) != 0

In the above code, there are some micro-optimizations where division and modulus by powers of 2 are replaced with bit shifts and bitwise-AND masks, but most compilers should do that for you.

like image 174
dbush Avatar answered Oct 17 '22 23:10

dbush


Largest native types (probably uint64_t) tend to perform the best. You can store your bitmasks in an array or generate them on-site by bitshifting. Contrary to intuition, the on-site generation might perform better due to better caching/memory access characteristics. In any case, it's a good idea to start code it in a fairly general fashion (e.g., define your types macros if you use pure C) and then test the different versions.

Caching (persistently or nonpersistently) some of your results might not be a bad idea either.

like image 20
PSkocik Avatar answered Oct 17 '22 23:10

PSkocik