Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Project Euler #14: Why is my TreeMap algorithm slower than brute force?

Background: I first learned C++ and Java in school a number of years ago, but I haven't done much programming over the past 9 or so years as my previous career did not require it.

I decided to look into Project Euler to brush up on my programming and solved problem 14, which asks to find the integer between one and one million with the longest Collatz sequence. (The Collatz sequence proceeds by, given a starting number, multiplying the number by 3 and adding 1 if it's odd, or halving the number if it's even. The process continues until the number reaches 1.)

I first solved the problem using brute force, as illustrated in the code below.

int n;
long temp; // long is necessary since some Collatz sequences go outside scope of int
int[] n_length = new int[1000000];
    for(n = 0; n < 1000000; n++){
        temp = n + 1;
        n_length[n] = 1;
        while (temp > 1){
            n_length[n]++;
            if (temp % 2 == 0) temp = temp/2;
            else temp = 3*temp + 1;

        }
    }
int max = 0;
    int max_index = 0;
    for (int i = 0; i < 1000000; i++){
        if (n_length[i] > max){
            max = n_length[i];
            max_index = i;
        }
    }
    System.out.println("The number with the longest Collatz sequence is " + (max_index + 1));

I thought this approach would be inefficient, since it runs the algorithm significantly more often than necessary. Any number that is part of a previous number's Collatz sequence will effectively have its sequence determined already, and so you end up calculating the sequence of every single number every single time it comes up in a Collatz sequence.

I decided it would be better to store each number in a map as soon as it comes up in a Collatz sequence, so you would only have to calculate it once. To do this, I used a TreeMap, with the numbers used as keys and the associate Collatz sequence length as the value, and used a recursive function to insert each number into the map as soon as it comes up in a Collatz sequence. (See the code below.)

public static TreeMap<Long, Integer> tm = new TreeMap<Long, Integer>();
public static void main(String[] args) {

    tm.put((long)1, 1);
    int maxVal = 1;
    long keyWithMaxVal = 1;
    int maybeMax;
    for (long i = 2; i <= 1000000; i++){
        if(!(tm.containsKey(i))){
            maybeMax = addKey(i);
            if (maybeMax >= maxVal){
                maxVal = maybeMax;
                keyWithMaxVal = i;
            }
        }
    }
    System.out.println("The number with the longest Collatz sequence is " + keyWithMaxVal + " with length " + maxVal);
}
public static int addKey(long key){

    while (!(tm.containsKey(key))){
        if (key % 2 == 0){
            tm.put(key, 1 +addKey(key/2));
        }
        else{
            tm.put(key, 1 + addKey(3*key + 1));
        }
    }
    return tm.get(key);
}

I used a TreeMap since it automatically sorts the keys on entry, so as I iterate through the for loop I can quickly check whether the keys have already been inserted and avoid calling the addKey method to add the keys unless I have to. I thought that this algorithm would be much, much faster.

However, when I actually ran the code, I was surprised to find that the brute force algorithm came up with the answer instantaneously, while recursive TreeMap algorithm took much longer, around 6 seconds. When I modified my programs to go up to 5 million rather than one million, the difference became even more pronounced. I added some code to each program to make sure that the second program was doing less work than the first, and indeed I determined that the addKey method was only being called once for each key, while the number of times the while loop needed to iterate in the first program was equal to the sum of the lengths of all numbers Collatz sequences (i.e. much more often than the number of method calls in the second algorithm).

So why is the first algorithm so much faster than the second? Is it because the array of primitives in the first algorithm requires fewer resources than the TreeMap of Wrapper objects in the second? Is searching the map to check if a key already exists slower than I anticipated (shouldn't it be log time?)? Are recursive methods that require large numbers of method calls inherently slower? Or is there something else I'm overlooking

like image 294
nobody Avatar asked May 16 '15 00:05

nobody


People also ask

Is Project Euler free?

(All of the Project Euler problems are Creative Commons-licensed and are free for non-commercial use.)

What language is Project Euler?

For example, most Project Euler problems are solved in Java most efficiently by using the features of Java that map 1:1 to features in C. You could do pretty much the entire suite of problems without even understanding the motivation of a language like Java.

Is Project Euler website down?

Projecteuler.net is UP and reachable by us.

Does Project Euler show solutions?

Projecteuler-solutions provides merely a list of answers -- that is, it does not provide solutions or code for individual problems. The link to the list of answers can be found at the top of this page.


2 Answers

This code tests how long it takes to find the longest collatz sequence for numbers between 1 and 5 million. It uses three different methods: iterative, recursive and storing results in a hash map.

The output looks like this

iterative
time = 2013ms
max n: 3732423, length: 597
number of iterations: 745438133

recursive
time = 2184ms
max n: 3732423, length: 597
number of iterations: 745438133

with hash map
time = 7463ms
max n: 3732423, length: 597
number of iterations: 15865083

So for the hash map solution the number of steps the program has to take is nearly 50 times smaller. Despite of that it is more then 3 times slower and I think the main reason for that is the fact that simple mathematical operations on numbers e.g. adding, multiplying, etc. is a lot faster than operations on hash maps.

import java.util.function.LongUnaryOperator;
import java.util.HashMap;

public class Collatz {
  static int iterations = 0;
  static HashMap<Long, Long> map = new HashMap<>();

  static long nextColl(long n) {
    if(n % 2 == 0) return n / 2;
    else return n * 3 + 1;
  }

  static long collatzLength(long n) {
    iterations++;
    int length = 1;
    while(n > 1) {
      iterations++;
      n = nextColl(n);
      length++;
    }
    return length;
  }

  static long collatzLengthMap(long n) {
    iterations++;
    if(n == 1) return 1;
    else return map.computeIfAbsent(n, x -> collatzLengthMap(nextColl(x)) + 1);
  }

  static long collatzLengthRec(long n) {
    iterations++;
    if(n == 1) return 1;
    else return collatzLengthRec(nextColl(n)) + 1;
  }

  static void test(String msg, LongUnaryOperator f) {
    iterations = 0;
    long max = 0, maxN = 0;
    long start = System.nanoTime();
    for(long i = 1; i <= 5000000; i++) {
      long length = f.applyAsLong(i);
      if(length > max) {
        max = length;
        maxN = i;
      }
    }
    long end = System.nanoTime();
    System.out.println(msg);
    System.out.println("time = " + ((end - start)/1000000) + "ms");
    System.out.println("max n: " + maxN + ", length: " + max);
    System.out.println("number of iterations: " + iterations);
    System.out.println();
  }

  public static void main(String[] args) {
    test("iterative", Collatz::collatzLength);
    test("recursive", Collatz::collatzLengthRec);
    test("with hash map", Collatz::collatzLengthMap);
  }
}
like image 93
SpiderPig Avatar answered Sep 28 '22 11:09

SpiderPig


In addition to the reasons already mentioned in other answers, the primary reason for the array based implementation to be so much faster is likely due to it having a lot of benefit of CPU caching effects:

  1. Your two separate small, tight loops will fully fit in the L0 instruction cache of a modern CPU (it can contain 1,536 decoded micro ops on a Sandy Bridge). Running those two sequentially is going to be much faster than a single loop with more instructions, that does not fit into that cache. Given that the second loop is very small, it is likely that its instructions have already been prefetched and decoded as micro ops, and will fit in the Loop Block Buffer (28 micro ops).

    loop block buffersource: hardwaresecrets.com

  2. There is a great locality of reference with regard to data access. Both in your first and your second loop, where you perform sequential access. There also the prefetcher helps, because your access pattern is fully predictable.


Related to these two topics and more, I would like to recommend you watch this excellent "skills cast": 95% of performance is about clean representative models by Martin Thompson, that discusses these and other topics in more detail.

like image 37
Alex Avatar answered Sep 28 '22 10:09

Alex