C performance and compiling options

Tags:

I've got two similar implementations (java and c++) for a trivial algorithm like the selection sort.

public interface SortingAlgorithm {

    public void sort(int[] a);
}

public class SelectionSort implements SortingAlgorithm {

    @Override
    public void sort(int[] a) {
        for (int i = 0; i < a.length; i++) {
            int lowerElementIndex = i;
            for (int j = i + 1; j < a.length; j++) {
                if (a[j] < a[lowerElementIndex]) {
                    lowerElementIndex = j;
                }
            }
            swap(a, lowerElementIndex, i);
        }
    }

    private void swap(int[] a, int i, int j) {
        if (i == j) {
            return;
        }
        int temp = a[i];
        a[i] = a[j];
        a[j] = temp;
    }
}

and the c one:

inline void swap(int* a, int i, int j);

void s_sort(int* a, int size) {
  int i;
  for (i = 0; i < size; i++) {
    int lowerElementIndex = i, j;
    for (j = i + 1; j < size; j++) {
      if (a[j] < a[lowerElementIndex]) {
    lowerElementIndex = j;
      }
    }
    swap(a, lowerElementIndex, i);
  }
}

inline void swap(int* a, int i, int j) {
  if (i == j) {
    return;
  }
  int temp = a[i];
  a[i] = a[j];
  a[j] = temp;
}

Now, I tried testing them on a large array (100000 random int). The results at first were java: ~17 sec (compiled and executed with oracle jdk/jvm) c: ~22 sec (compiled with gcc v4.8 without any optimization)

Of course, i then tried to optimize my c version through cflags. The results are(I'm reporting cflags only): -O1: ~18.4

-O2: ~18.4

-O{3-9}: ~20.9

Now, my first question is which cflacs should I use to compile?

So I read the gnu manual about optimizations. Adding -march=native didn't helped. After some time spent trying other options, I came into -fprofile-arcs option. Adding it to my flags made my code finish its test in about 11 seconds! However some files appeared in my folders: the results of the profiling. As I understand, I should use them with -fbranch-probabilities and recompile the code. Recompiling results again in ~18.5 sec. And this is what I really want to ask.

How is it possible for my program to run so fast if it has to write files and collect profiling information and instead it runs 1.5 times slower when it hasn't?

I forgot to mention that I'm on an old PC with a Intel Celeron @2.8GHz processor and linux (fedora 20 with xfce) installed. If you need other information about the hardware just ask! ;)

Edit: The code I use for the test is:

Java:

public class Test {

    public static void main(String[] args) {
        int[] a = new int[100000];
        int[] a2 = new int[100000];
        for (int i = 0; i < a.length; i++) {
            a[i] = (int)(Math.random()*100000);
            a2[i] = a[i];
        }
        SelectionSort s = new SelectionSort();
        InsertionSort s1 = new InsertionSort();
        double start = System.nanoTime();
        s.sort(a);
        double end = System.nanoTime();
        double time = (end-start)/1000000000.0; 
        System.out.println("Selection: "+time);
        start = System.nanoTime();
        s1.sort(a2);
        end = System.nanoTime();
        time = (end-start)/1000000000.0;
        System.out.println("Insertion: "+time);
    }
}

And the c:

#include "insertion_sort.h"
#include "selection_sort.h"
#include <time.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main() {
  int max = 100000, i;
  srand(time(NULL));

  int array[100000], array2[100000];
  for(i=0; i<100000; i+=1) {
    array[i] = rand()%100000;
  }

  memcpy(array2, &array[0], 100000 * sizeof(int));

  clock_t inizio = clock();
  s_sort(array, max);
  clock_t fine = clock();
  float tempoEsecuzione = (float)(fine - inizio) / CLOCKS_PER_SEC;
  printf("Selection: %2.3f\n", tempoEsecuzione);

  inizio = clock();
  i_sort(array2, max);
  fine = clock();
  tempoEsecuzione = (float)(fine - inizio) / CLOCKS_PER_SEC;
  printf("Insertion: %2.3f\n", tempoEsecuzione);
  return 0;
}

The code contains references to a insertion sort function that I haven't included in the rest of the question because (as expected) java run slower that c.

862

asked Jan 10 '14 17:01

marco6

1 Answers

And this is what I really want to ask.

How is it possible for my program to run so fast if it has to write files and collect profiling information and instead it runs 1.5 times slower when it hasn't?

Yes, that's the real question here. Mentioning all that Java comparison stuff just adds noise.

I could reproduce the weird behavior on my machine with gcc 4.7.2. Not surprisingly, the hot path of the code is the inner for loop:

for (j = i + 1; j < size; j++) {
  if (a[j] < a[lowerElementIndex]) {
    lowerElementIndex = j;
}

The only relevant difference in the corresponding generated assembly code is:

Fast case:

    cmpl    %esi, %ecx
    jge .L3
    movl    %ecx, %esi
    movslq  %edx, %rdi
.L3:

Slow case:

cmpl    %ecx, %esi
cmovl   %edx, %edi
cmovl   %esi, %ecx

The first case (fast) can greatly benefit from branch prediction but the other (slow case) apparently cannot. Sorted or randomly shuffled arrays do not cause too much branch mispredictions. The first code snippet is optimal in that case.

As it turns out, it is actually difficult to create a dataset that causes a lot of branch mispredictions in selection sort. (It was pointed out by Yakk; see also my attempts to create an evil dataset; so far, I failed to create one.)

The -fprofile-arcs happens to disable tree vectorization which seems to be responsible for generating the slow case code. A better way to disable tree vectorization is to pass the -fno-tree-vectorize flag.

clang 3.4 also generates the fast case code, without any special flag. The Java code without warm up runs 2.4x slower than the C code. (Since it wasn't the question, I haven't looked into improving the Java code performance.)

answered Sep 28 '22 19:09

Ali

Related questions
                            
                                Why eclipse doesn't see implemented interfaces?
                            
                                Netbeans IDE not issuing warnings about methods called in constructors
                            
                                error: annotation type not applicable to this kind of declaration
                            
                                How to check if two Box2d bodies collision / overlap at any moment?
                            
                                Best delimiter to separate multipe regex
                            
                                How to create zip with lzma compression
                            
                                How to read doc and docx file in java with POI api
                            
                                When is InvocationTargetException.getCause() null?
                            
                                Why doesn't JTextComponent.setText(String) normalize line endings?
                            
                                static inner class and static member of a class shares SAME NAME? [duplicate]
                            
                                What kind of runtime optimizations are lost if we use reflection
                            
                                Save fragment state with navigation drawer
                            
                                What is the best way to emulate try-with-resources in Java 6?
                            
                                How to handle urn:ietf:wg:oauth:2.0:oob redirect in Google Calendar API Authorization
                            
                                How to debug a Java application without access to the source code?
                            
                                Sudoku solving algorithm with back-tracking
                            
                                How would I create a responsive layout in Java Swing
                            
                                Java Generics, Tightly Bounded Parameter Type
                            
                                Missing tools.jar file [duplicate]
                            
                                Proxy HTTP digest authentication request to LDAP server

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

C performance and compiling options

Tags:

java

performance

c

gcc

assembly

marco6

People also ask

1 Answers

Ali

Recent Activity

Donate For Us