Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Did I just prove that sieve of Eratosthenes is less efficient than trial division?

I was trying to compare the run-time speed of two algorithms: A brute-force C program to print prime numbers (10,000 numbers), and a Sieve of Eratosthenes C program (also 10,000 prime numbers).

My measured run-time for the sieve algorithm was: 0.744 seconds

My measured run-time for the brute-force algorithm was: 0.262 seconds

However, I was told that the Sieve of Eratosthenes algorithm is more efficient than the brute-force method, and so I thought it would run faster. So either I'm wrong or my program is flawed (which I doubt).

Therefore, my question is: Since I got the opposite results than what I expected, does this prove that the Sieve of Eratosthenes really is the less efficient algorithm in terms of speed, compared to the trial division?

I'm not sure if it is of any relevance, but I'm using the Dev C++ compiler and Windows 7.

like image 803
C_Intermediate_Learner Avatar asked Aug 16 '13 09:08

C_Intermediate_Learner


3 Answers

TL;DR: comparing the speed of code variants at just one input size is meaningless; comparing empirical orders of growth truly reflects algorithmic nature of the code and will be consistent across different test platforms, for the same test range of input sizes. Comparing absolute speed values is only meaningful for code variants which exhibit same asymptotic or at least local growth behaviour.


It is not enough to measure speed of your two implementations just at one input size. Usually several data points are needed, to assess the run time empirical orders of growth of our code (because the code can be run with varying input sizes). It is found as the logarithm of the ratio of run times, in base of the ratio of input sizes.

So even if at some input code_1 runs 10 times faster than code_2, but its run time doubles with each doubling of the input size, whereas for code_2 it only grows as 1.1x, very soon code_2 will become much much faster than code_1.

So the real measure of an algorithm's efficiency is its run time complexity (and the complexity of its space i.e. memory requirements). And when we measure it empirically, we only measure if for the particular code at hand (at a particular range of input sizes), not for the algorithm itself, i.e. the ideal implementation of it.

In particular, the theoretical complexity of trial division is O(n^1.5 / (log n)^0.5), in n primes produced, usually seen as ~ n^1.40..1.45 empirical order of growth (but it can be ~n^1.3 initially, for smaller input sizes). For the sieve of Eratosthenes it is O(n log n log (log n)), seen usually as ~ n^1.1..1.2. But there certainly are sub-optimal implementations of both the trial division and the sieve of Eratosthenes that run at ~n^2.0 and worse.

So no, this proves nothing. One datapoint is meaningless, at least three are needed to get a "big picture" i.e. to be able to predict with some certainty the run time ⁄ space needed for bigger input sizes.

Prediction with known certainty is what the scientific method is all about.


BTW your run times are very long. The calculation of 10,000 primes should be nearly instantaneous, much less than 1/100th of a second for a C program run on a fast box. Perhaps you're measuring printing time as well. Don't. :)

like image 200
Will Ness Avatar answered Sep 18 '22 13:09

Will Ness


No, elapsed run time is not a standard for measuring efficiency as it varies from platform to platform -- saying "my algorithm ran in 10 seconds" gives little to no information about the algorithm itself. In addition to that, you would need to list the entire environment specs and other processes running at the same time and it would be a huge mess. Hence, the development of the order notations (Big Oh, Little Oh, Omega, etc.).

Efficiency is typically branched into two subsections:

  1. Time efficiency.
  2. Space efficiency.

... where one algorithm may be extremely time efficiency, but very inefficient space-wise. Vice-versa applies. Algorithms are analyzed based on their asymptotic behaviour when scaling the amount of instructions they need to execute for a given input n. This is a very high-level explanation on a field that is meticulously studied by PhD Computer Scientists -- I suggest you read more about it here for the best low-level explanation that you will find.

Note, I am attaching the link for Big Oh notation -- the sister notations can all be found off of that Wikipedia page and it's typically a good place to start. It will get into the difference of space and time efficiency as well.

Small Application of Time Efficiency using Big Oh:

Consider the following recursive function in Racket (would be in Python if I knew it -- best pseudo code I can do):

(define (fn_a input_a)
  (cond
    [(empty? input_a) empty]
    [(empty? (rest input_a)) input_a]
    [(> (first input_a) (fn_a (rest input_a))) (cons (first input_a) empty)]
    [else (fn_a (rest input_a))]))

... we see that: empty?, rest, > and first are all O(1). We also notice that in the worst case, a call is made to fn_a in the third condition and fourth condition on the rest of input_a. We can then write our recurrence relation as, T(n) = O(1) + 2T(n - 1). Looking this up on a recurrence relation chart we see that fn_a is of order O(2^n) because in the worst case, two recursive calls are made.

It's also important to note that, by the formal definition of Big Oh it is also correct (however useless) to state that fn_a is O(3^n). Lot's of algorithms when analyzed are stated using Big Oh however it would be more appropriate to use Big Theta to tighten the bounds, essentially meaning: the lowest, most accurate order with respect to a given algorithm.

Be careful, read the formal definitions!

like image 23
Jacob Pollack Avatar answered Sep 21 '22 13:09

Jacob Pollack


Does a longer run-time mean a less efficient algorithm?

Not necessary. The efficiency of program is measured not only by the time it takes but also by the resources which it is taking. Space is one other factor which is kept in mind while considering the efficiency.

From the wiki:-

For maximum efficiency we wish to minimize resource usage. However, the various resources (e.g. time, space) can not be compared directly, so which of two algorithms is considered to be more efficient often depends on which measure of efficiency is being considered as the most important, e.g. is the requirement for high speed, or for minimum memory usage, or for some other measure?

like image 31
Rahul Tripathi Avatar answered Sep 21 '22 13:09

Rahul Tripathi