Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How fast is D compared to C++?

People also ask

Is D as fast as C++?

Genie, MiniD, Qore, Swift, Vala, C++11, C++14, C++17, C++20, Go, C#, and others. D combines the performance and safety of compiled languages with the expressive power of modern dynamic and functional programming languages. Idiomatic D code is commonly as fast as equivalent C++ code, while also being shorter.

Is C the fastest?

Assembly is almost pure binary so it is without bias the fastest language. C is the fastest because it's the speed of light, and relativity?

Is there something faster than C?

Judging the performance of programming languages, usually C is called the leader, though Fortran is often faster. New programming languages commonly use C as their reference and they are really proud to be only so much slower than C.

Is C++ just as fast as C?

Performance is slow compared to C++. C++ language is an object-oriented programming language, and it supports some important features like Polymorphism, Abstract Data Types, Encapsulation, etc. Since it supports object-orientation, speed is faster compared to the C language.


To enable all optimizations and disable all safety checks, compile your D program with the following DMD flags:

-O -inline -release -noboundscheck

EDIT: I've tried your programs with g++, dmd and gdc. dmd does lag behind, but gdc achieves performance very close to g++. The commandline I used was gdmd -O -release -inline (gdmd is a wrapper around gdc which accepts dmd options).

Looking at the assembler listing, it looks like neither dmd nor gdc inlined scalar_product, but g++/gdc did emit MMX instructions, so they might be auto-vectorizing the loop.


One big thing that slows D down is a subpar garbage collection implementation. Benchmarks that don't heavily stress the GC will show very similar performance to C and C++ code compiled with the same compiler backend. Benchmarks that do heavily stress the GC will show that D performs abysmally. Rest assured, though, this is a single (albeit severe) quality-of-implementation issue, not a baked-in guarantee of slowness. Also, D gives you the ability to opt out of GC and tune memory management in performance-critical bits, while still using it in the less performance-critical 95% of your code.

I've put some effort into improving GC performance lately and the results have been rather dramatic, at least on synthetic benchmarks. Hopefully these changes will be integrated into one of the next few releases and will mitigate the issue.


This is a very instructive thread, thanks for all the work to the OP and helpers.

One note - this test is not assessing the general question of abstraction/feature penalty or even that of backend quality. It focuses on virtually one optimization (loop optimization). I think it's fair to say that gcc's backend is somewhat more refined than dmd's, but it would be a mistake to assume that the gap between them is as large for all tasks.


Definitely seems like a quality-of-implementation issue.

I ran some tests with the OP's code and made some changes. I actually got D going faster for LDC/clang++, operating on the assumption that arrays must be allocated dynamically (xs and associated scalars). See below for some numbers.

Questions for the OP

Is it intentional that the same seed be used for each iteration of C++, while not so for D?

Setup

I have tweaked the original D source (dubbed scalar.d) to make it portable between platforms. This only involved changing the type of the numbers used to access and modify the size of arrays.

After this, I made the following changes:

  • Used uninitializedArray to avoid default inits for scalars in xs (probably made the biggest difference). This is important because D normally default-inits everything silently, which C++ does not.

  • Factored out printing code and replaced writefln with writeln

  • Changed imports to be selective
  • Used pow operator (^^) instead of manual multiplication for final step of calculating average
  • Removed the size_type and replaced appropriately with the new index_type alias

...thus resulting in scalar2.cpp (pastebin):

    import std.stdio : writeln;
    import std.datetime : Clock, Duration;
    import std.array : uninitializedArray;
    import std.random : uniform;

    alias result_type = long;
    alias value_type = int;
    alias vector_t = value_type[];
    alias index_type = typeof(vector_t.init.length);// Make index integrals portable - Linux is ulong, Win8.1 is uint

    immutable long N = 20000;
    immutable int size = 10;

    // Replaced for loops with appropriate foreach versions
    value_type scalar_product(in ref vector_t x, in ref vector_t y) { // "in" is the same as "const" here
      value_type res = 0;
      for(index_type i = 0; i < size; ++i)
        res += x[i] * y[i];
      return res;
    }

    int main() {
      auto tm_before = Clock.currTime;
      auto countElapsed(in string taskName) { // Factor out printing code
        writeln(taskName, ": ", Clock.currTime - tm_before);
        tm_before = Clock.currTime;
      }

      // 1. allocate and fill randomly many short vectors
      vector_t[] xs = uninitializedArray!(vector_t[])(N);// Avoid default inits of inner arrays
      for(index_type i = 0; i < N; ++i)
        xs[i] = uninitializedArray!(vector_t)(size);// Avoid more default inits of values
      countElapsed("allocation");

      for(index_type i = 0; i < N; ++i)
        for(index_type j = 0; j < size; ++j)
          xs[i][j] = uniform(-1000, 1000);
      countElapsed("random");

      // 2. compute all pairwise scalar products:
      result_type avg = 0;
      for(index_type i = 0; i < N; ++i)
        for(index_type j = 0; j < N; ++j)
          avg += scalar_product(xs[i], xs[j]);
      avg /= N ^^ 2;// Replace manual multiplication with pow operator
      writeln("result: ", avg);
      countElapsed("scalar products");

      return 0;
    }

After testing scalar2.d (which prioritized optimization for speed), out of curiousity I replaced the loops in main with foreach equivalents, and called it scalar3.d (pastebin):

    import std.stdio : writeln;
    import std.datetime : Clock, Duration;
    import std.array : uninitializedArray;
    import std.random : uniform;

    alias result_type = long;
    alias value_type = int;
    alias vector_t = value_type[];
    alias index_type = typeof(vector_t.init.length);// Make index integrals portable - Linux is ulong, Win8.1 is uint

    immutable long N = 20000;
    immutable int size = 10;

    // Replaced for loops with appropriate foreach versions
    value_type scalar_product(in ref vector_t x, in ref vector_t y) { // "in" is the same as "const" here
      value_type res = 0;
      for(index_type i = 0; i < size; ++i)
        res += x[i] * y[i];
      return res;
    }

    int main() {
      auto tm_before = Clock.currTime;
      auto countElapsed(in string taskName) { // Factor out printing code
        writeln(taskName, ": ", Clock.currTime - tm_before);
        tm_before = Clock.currTime;
      }

      // 1. allocate and fill randomly many short vectors
      vector_t[] xs = uninitializedArray!(vector_t[])(N);// Avoid default inits of inner arrays
      foreach(ref x; xs)
        x = uninitializedArray!(vector_t)(size);// Avoid more default inits of values
      countElapsed("allocation");

      foreach(ref x; xs)
        foreach(ref val; x)
          val = uniform(-1000, 1000);
      countElapsed("random");

      // 2. compute all pairwise scalar products:
      result_type avg = 0;
      foreach(const ref x; xs)
        foreach(const ref y; xs)
          avg += scalar_product(x, y);
      avg /= N ^^ 2;// Replace manual multiplication with pow operator
      writeln("result: ", avg);
      countElapsed("scalar products");

      return 0;
    }

I compiled each of these tests using an LLVM-based compiler, since LDC seems to be the best option for D compilation in terms of performance. On my x86_64 Arch Linux installation I used the following packages:

  • clang 3.6.0-3
  • ldc 1:0.15.1-4
  • dtools 2.067.0-2

I used the following commands to compile each:

  • C++: clang++ scalar.cpp -o"scalar.cpp.exe" -std=c++11 -O3
  • D: rdmd --compiler=ldc2 -O3 -boundscheck=off <sourcefile>

Results

The results (screenshot of raw console output) of each version of the source as follows:

  1. scalar.cpp (original C++):

    allocation: 2 ms
    
    random generation: 12 ms
    
    result: 29248300000
    
    time: 2582 ms
    

    C++ sets the standard at 2582 ms.

  2. scalar.d (modified OP source):

    allocation: 5 ms, 293 μs, and 5 hnsecs 
    
    random: 10 ms, 866 μs, and 4 hnsecs 
    
    result: 53237080000
    
    scalar products: 2 secs, 956 ms, 513 μs, and 7 hnsecs 
    

    This ran for ~2957 ms. Slower than the C++ implementation, but not too much.

  3. scalar2.d (index/length type change and uninitializedArray optimization):

    allocation: 2 ms, 464 μs, and 2 hnsecs
    
    random: 5 ms, 792 μs, and 6 hnsecs
    
    result: 59
    
    scalar products: 1 sec, 859 ms, 942 μs, and 9 hnsecs
    

    In other words, ~1860 ms. So far this is in the lead.

  4. scalar3.d (foreaches):

    allocation: 2 ms, 911 μs, and 3 hnsecs
    
    random: 7 ms, 567 μs, and 8 hnsecs
    
    result: 189
    
    scalar products: 2 secs, 182 ms, and 366 μs
    

    ~2182 ms is slower than scalar2.d, but faster than the C++ version.

Conclusion

With the correct optimizations, the D implementation actually went faster than its equivalent C++ implementation using the LLVM-based compilers available. The current gap between D and C++ for most applications seems only to be based on limitations of current implementations.


dmd is the reference implementation of the language and thus most work is put into the frontend to fix bugs rather than optimizing the backend.

"in" is faster in your case cause you are using dynamic arrays which are reference types. With ref you introduce another level of indirection (which is normally used to alter the array itself and not only the contents).

Vectors are usually implemented with structs where const ref makes perfect sense. See smallptD vs. smallpt for a real-world example featuring loads of vector operations and randomness.

Note that 64-Bit can also make a difference. I once missed that on x64 gcc compiles 64-Bit code while dmd still defaults to 32 (will change when the 64-Bit codegen matures). There was a remarkable speedup with "dmd -m64 ...".


Whether C++ or D is faster is likely to be highly dependent on what you're doing. I would think that when comparing well-written C++ to well-written D code, they would generally either be of similar speed, or C++ would be faster, but what the particular compiler manages to optimize could have a big effect completely aside from the language itself.

However, there are a few cases where D stands a good chance of beating C++ for speed. The main one which comes to mind would be string processing. Thanks to D's array slicing capabalities, strings (and arrays in general) can be processed much faster than you can readily do in C++. For D1, Tango's XML processor is extremely fast, thanks primarily to D's array slicing capabilities (and hopefully D2 will have a similarly fast XML parser once the one that's currently being worked on for Phobos has been completed). So, ultimately whether D or C++ is going to be faster is going to be very dependent on what you're doing.

Now, I am suprised that you're seeing such a difference in speed in this particular case, but it is the sort of thing that I would expect to improve as dmd improves. Using gdc might yield better results and would likely be a closer comparison of the language itself (rather than the backend) given that it's gcc-based. But it wouldn't surprise me at all if there are a number of things which could be done to speed up the code that dmd generates. I don't think that there's much question that gcc is more mature than dmd at this point. And code optimizations are one of the prime fruits of code maturity.

Ultimately, what matters is how well dmd performs for your particular application, but I do agree that it would definitely be nice to know how well C++ and D compare in general. In theory, they should be pretty much the same, but it really depends on the implementation. I think that a comprehensive set of benchmarks would be required to really test how well the two presently compare however.


You can write C code is D so as far as which is faster, it will depend on a lot of things:

  • What compiler you use
  • What feature you use
  • how aggressively you optimize

Differences in the first aren't fair to drag in. The second might give C++ an advantage as it, if anything, has fewer heavy features. The third is the fun one: D code in some ways is easier to optimize because in general it is easier to understand. Also it has the ability to do a large degree of generative programing allowing things like verbose and repetitive but fast code to be written in a shorter forms.