What influences the speed of code?

3 Answers

Integers are binary. Or rather, integers are simply integers, and can be represented in any number base. In base 10 you'd write 13, in base 16 you'd write d (or 0xd), in binary you'd write 1101 (or 0b1101). The Romans would've written XIII. They all represent the same concept of the number 'thirteen'. Among humans, we tend to use the base 10 representation, but when you ask a computer to process integers, it uses the binary representation. And it doesn't make a difference. Thirteen plus fortyfive yields the same result no matter how I write it. XIII + 0x2D = 13 + 45 = 0xd + 0b101101. It doesn't matter which representation you use, the result of an arithmetic operation is the same. Which is why we allow the CPU to use the binary representation for all integer processing.

Some programming languages also give you a "decimal" datatype, but that's generally related to floating-point arithmetics, where not all values can be represented in all bases (1/3 can be represented easily in base 3, but not in 2 or 10, for example. 1/10 can be represented in base 10, but not 2)

However, it is surprisingly difficult to single out any particular operations as "slow", because it depends. A modern CPU employs a lot of tricks and optimizations to speed up most operations most of the time. So really, what you need to do to get efficient code is avoid all the special cases. And there are a lot of them, and they're generally more to do with the combination (and order) of instructions, than which instructions are used.

Just to give you an idea of the kind of subtleties we're talking about, floating point arithmetic can be executed as fast (or sometimes faster than) integer arithmetics under ideal circumstances, but the latency is longer, meaning that ideal performance is harder to achieve. Branches which are otherwise almost free become painful because they inhibit instruction reordering and scheduling, in the compiler and on the fly on the CPU, which makes it harder to hide this latency. Latency defines how long it takes from an instruction is initiated until the result is ready; most instructions only occupy the CPU one clock cycle Even if the result isn't yet ready the following cycle, the CPU is able to start another instruction then. Which means that if the result is not immediately needed, high-latency instructions are almost free. But if you need to feed the result to the next instruction, then that'll have to wait until the result is finished.

Certain instructions are just slow no matter what you do, and will typically stall the relevant parts of the CPU until the instruction has completed (square root is a common example, but integer division may be another. On some CPU's, doubles in general suffer from the same problem) - on the other hand, while a floating-point square root will block the FP pipeline, it won't prevent you from executing integer instructions simultaneously.

Sometimes, storing values in variables which could be recomputed again as needed, will be faster, because they can be put in a register saving a few cycles. Other times, it'll be slower because you run out of registers, and the value would have to be pushed to the cache, or even to RAM, making recomputation on every use preferable. The order in which you traverse memory makes a huge difference. Random (scattered) accesses can take hundreds of cycles to complete, but sequential ones are almost instant. Performing your reads/writes in the right pattern allows the CPU to keep the required data in cache almost all the time, and usually "the right pattern" means reading data sequentially, and working on chunks of ~64kb at a time. But sometimes not. On an x86 CPU, some instructions take up 1 byte, others take 17. If your code contains a lot of the former, instruction fetch and decoding won't be a bottleneck, but if it's full of the longer instructions, that'll probably limit how many instructions can be loaded by the CPU each cycle, and then the amount it'd be able to execute doesn't matter.

There are very few universal rules for performance in a modern CPU.

answered Oct 14 '22 13:10

jalf

I think you're mistaken. Integers are binary numbers. Image editing software will do anything it can to avoid floating point calculations, because they are incredibly slow compared to integer or bit-shift operations.

But normally, first and foremost you optimize by choosing the right algorithm, not by fiddly little things like worrying about whether to post-increment or pre-increment.

For instance: I just spent the last two days speeding up the way we recalculated a particular set of values. I yanked some stuff out of a loop and pre-calculated it, so it was only done M times instead of M x N times, and stored a value in a variable instead of looking it up each time from somewhere else because that value was used in a Comparator so it would be called a lot during the Collections.sort phase. I got the total execution time from around 45 seconds to 20 seconds. And then one of my coworkers who'd been here a lot longer pointed out that I didn't need to recalculate those values, I could pull them out of a different object. And suddenly it executes in 2 seconds. Now that is optimization I can believe in.

answered Oct 14 '22 14:10

Paul Tomblin

The "speed of programs" usually comes down to algorithm choice. The wrong algorithm can turn a 2 second task into a 2 minute one or worse. You will see the best performance gains when you concentrate on getting this right.

Once you have a good algorithm, you may still need a more efficient implementation. Attaining this this often relies on "speed of code" type choices. There are several things to consider, which tend to be very hardware dependant. Optimizations for one processor may actually slow down the code on others.

Some "code speed" factors:

integer vs floating point numbers
costs of branching, function calls, context switches
CPU instruction pipeline disruptions
maintaing locality of memory access, cache coherence.

answered Oct 14 '22 15:10

AShelly

Related questions
                            
                                JVM Memory : Why memory on task manager difference with JProbe (or JConsole tool)
                            
                                "It is not possible to profile 32-bit and 64-bit applications at the same time" error
                            
                                CPU Cycle count based profiling in C/C++ Linux x86_64
                            
                                `gprof` time spent in particular lines of code
                            
                                How to measure the time profile of each django test?
                            
                                How to measure the amount of memory an individual object takes in .NET
                            
                                Possible Memory leak in Number of Loaded classes in Java Application
                            
                                Tools to profile performance of .NET Compact Framework applications
                            
                                Finding bottlenecks in application
                            
                                Profilers Instrumenting Vs Sampling
                            
                                Is there a way to profile ruby 1.9.2 scripts with memory allocation reports?
                            
                                Digging into R profiling information
                            
                                Celery tasks profiling
                            
                                Unable to use VisualVM profiler with Maven Jetty plugin
                            
                                C - measuring computing time
                            
                                Profiling with CMake, C++, and Visual Studio 2012
                            
                                Python Line_profiler and Cython function
                            
                                Fortran query and print out function or subroutine name
                            
                                Unknown events in nodejs/v8 flamegraph using perf_events
                            
                                Python Function calls are really slow

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What influences the speed of code?

Tags:

performance

profiling

Vordreller

People also ask

3 Answers

jalf

Paul Tomblin

AShelly

Recent Activity

Donate For Us