Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What influences the speed of code?

The way to see how fast your code is going, is performance profiling. There are tools for it and such, but I wonder what the factors are for code speed.

For instance, I've been told that image-editting software will use bitwise operations instead of integer variables to calculate their stuff, simply because it's faster.

So that must mean working with integers and other primitive types takes a few more steps to calculate compared to binairy.

There must be other stuff, but I don't have enough experience with how an OS connects to your hardware and the internal workings of many coding languages to know what.

So I'm asking here: do you know what influences the speed of code?

Not necessarily the speed of programs.

like image 364
Vordreller Avatar asked Mar 06 '09 20:03

Vordreller


People also ask

What makes a code run faster?

To code faster, one has to be efficient; that is, no wasted effort or motion. This can mean everything from typing to tools to thinking. But most of our work as programmers isn't typing, or compiling—it's thinking. To think faster, you have to learn more patterns and relationships.

What makes a code run slow?

Performance is primarily determined by the algorithm of the code. There is no question of that. Some algorithm written in python is a lot slower than written in c.


3 Answers

Integers are binary. Or rather, integers are simply integers, and can be represented in any number base. In base 10 you'd write 13, in base 16 you'd write d (or 0xd), in binary you'd write 1101 (or 0b1101). The Romans would've written XIII. They all represent the same concept of the number 'thirteen'. Among humans, we tend to use the base 10 representation, but when you ask a computer to process integers, it uses the binary representation. And it doesn't make a difference. Thirteen plus fortyfive yields the same result no matter how I write it. XIII + 0x2D = 13 + 45 = 0xd + 0b101101. It doesn't matter which representation you use, the result of an arithmetic operation is the same. Which is why we allow the CPU to use the binary representation for all integer processing.

Some programming languages also give you a "decimal" datatype, but that's generally related to floating-point arithmetics, where not all values can be represented in all bases (1/3 can be represented easily in base 3, but not in 2 or 10, for example. 1/10 can be represented in base 10, but not 2)

However, it is surprisingly difficult to single out any particular operations as "slow", because it depends. A modern CPU employs a lot of tricks and optimizations to speed up most operations most of the time. So really, what you need to do to get efficient code is avoid all the special cases. And there are a lot of them, and they're generally more to do with the combination (and order) of instructions, than which instructions are used.

Just to give you an idea of the kind of subtleties we're talking about, floating point arithmetic can be executed as fast (or sometimes faster than) integer arithmetics under ideal circumstances, but the latency is longer, meaning that ideal performance is harder to achieve. Branches which are otherwise almost free become painful because they inhibit instruction reordering and scheduling, in the compiler and on the fly on the CPU, which makes it harder to hide this latency. Latency defines how long it takes from an instruction is initiated until the result is ready; most instructions only occupy the CPU one clock cycle Even if the result isn't yet ready the following cycle, the CPU is able to start another instruction then. Which means that if the result is not immediately needed, high-latency instructions are almost free. But if you need to feed the result to the next instruction, then that'll have to wait until the result is finished.

Certain instructions are just slow no matter what you do, and will typically stall the relevant parts of the CPU until the instruction has completed (square root is a common example, but integer division may be another. On some CPU's, doubles in general suffer from the same problem) - on the other hand, while a floating-point square root will block the FP pipeline, it won't prevent you from executing integer instructions simultaneously.

Sometimes, storing values in variables which could be recomputed again as needed, will be faster, because they can be put in a register saving a few cycles. Other times, it'll be slower because you run out of registers, and the value would have to be pushed to the cache, or even to RAM, making recomputation on every use preferable. The order in which you traverse memory makes a huge difference. Random (scattered) accesses can take hundreds of cycles to complete, but sequential ones are almost instant. Performing your reads/writes in the right pattern allows the CPU to keep the required data in cache almost all the time, and usually "the right pattern" means reading data sequentially, and working on chunks of ~64kb at a time. But sometimes not. On an x86 CPU, some instructions take up 1 byte, others take 17. If your code contains a lot of the former, instruction fetch and decoding won't be a bottleneck, but if it's full of the longer instructions, that'll probably limit how many instructions can be loaded by the CPU each cycle, and then the amount it'd be able to execute doesn't matter.

There are very few universal rules for performance in a modern CPU.

like image 81
jalf Avatar answered Oct 14 '22 13:10

jalf


I think you're mistaken. Integers are binary numbers. Image editing software will do anything it can to avoid floating point calculations, because they are incredibly slow compared to integer or bit-shift operations.

But normally, first and foremost you optimize by choosing the right algorithm, not by fiddly little things like worrying about whether to post-increment or pre-increment.

For instance: I just spent the last two days speeding up the way we recalculated a particular set of values. I yanked some stuff out of a loop and pre-calculated it, so it was only done M times instead of M x N times, and stored a value in a variable instead of looking it up each time from somewhere else because that value was used in a Comparator so it would be called a lot during the Collections.sort phase. I got the total execution time from around 45 seconds to 20 seconds. And then one of my coworkers who'd been here a lot longer pointed out that I didn't need to recalculate those values, I could pull them out of a different object. And suddenly it executes in 2 seconds. Now that is optimization I can believe in.

like image 10
Paul Tomblin Avatar answered Oct 14 '22 14:10

Paul Tomblin


The "speed of programs" usually comes down to algorithm choice. The wrong algorithm can turn a 2 second task into a 2 minute one or worse. You will see the best performance gains when you concentrate on getting this right.

Once you have a good algorithm, you may still need a more efficient implementation. Attaining this this often relies on "speed of code" type choices. There are several things to consider, which tend to be very hardware dependant. Optimizations for one processor may actually slow down the code on others.

Some "code speed" factors:

  • integer vs floating point numbers
  • costs of branching, function calls, context switches
  • CPU instruction pipeline disruptions
  • maintaing locality of memory access, cache coherence.
like image 4
AShelly Avatar answered Oct 14 '22 15:10

AShelly