This may appear to be a bit stupid question but seeing Alexandre C's reply in the other topic, I'm curious to know that if there is any performance difference with the built-in types: <blockquote> <code>char</code> vs <code>short</code> vs <code>int</code> vs. <code>float</code> vs. <code>double</code>. </blockquote> Usually we don't consider such performance difference (if any) in our real life projects, but I would like to know this for educational purpose. The general questions can be asked is: <ul> <li>Is there any performance difference between integral arithmetics and floating-point arithmetic?</li> <li>Which is faster? What is the reason for being faster? Please explain this.</li> </ul>

Float vs. integer: Historically, floating-point could be much slower than integer arithmetic. On modern computers, this is no longer really the case (it is somewhat slower on some platforms, but unless you write perfect code and optimize for every cycle, the difference will be swamped by the other inefficiencies in your code). On somewhat limited processors, like those in high-end cell phones, floating-point may be somewhat slower than integer, but it's generally within an order of magnitude (or better), so long as there is hardware floating-point available. It's worth noting that this gap is closing pretty rapidly as cell phones are called on to run more and more general computing workloads. On very limited processors (cheap cell phones and your toaster), there is generally no floating-point hardware, so floating-point operations need to be emulated in software. This is slow -- a couple orders of magnitude slower than integer arithmetic. As I said though, people are expecting their phones and other devices to behave more and more like "real computers", and hardware designers are rapidly beefing up FPUs to meet that demand. Unless you're chasing every last cycle, or you're writing code for very limited CPUs that have little or no floating-point support, the performance distinction doesn't matter to you. Different size integer types: Typically, CPUs are fastest at operating on integers of their native word size (with some caveats about 64-bit systems). 32 bit operations are often faster than 8- or 16- bit operations on modern CPUs, but this varies quite a bit between architectures. Also, remember that you can't consider the speed of a CPU in isolation; it's part of a complex system. Even if operating on 16-bit numbers is 2x slower than operating on 32-bit numbers, you can fit twice as much data into the cache hierarchy when you represent it with 16-bit numbers instead of 32-bits. If that makes the difference between having all your data come from cache instead of taking frequent cache misses, then the faster memory access will trump the slower operation of the CPU. Other notes: Vectorization tips the balance further in favor of narrower types (<code>float</code> and 8- and 16-bit integers) -- you can do more operations in a vector of the same width. However, good vector code is hard to write, so it's not as though you get this benefit without a lot of careful work. Why are there performance differences? There are really only two factors that effect whether or not an operation is fast on a CPU: the circuit complexity of the operation, and user demand for the operation to be fast. (Within reason) any operation can be made fast, if the chip designers are willing to throw enough transistors at the problem. But transistors cost money (or rather, using lots of transistors makes your chip larger, which means you get fewer chips per wafer and lower yields, which costs money), so chip designers have to balance how much complexity to use for which operations, and they do this based on (perceived) user demand. Roughly, you might think of breaking operations into four categories: <pre class="prettyprint"><code> high demand low demand high complexity FP add, multiply division low complexity integer add popcount, hcf boolean ops, shifts </code></pre> high-demand, low-complexity operations will be fast on nearly any CPU: they're the low-hanging fruit, and confer maximum user benefit per transistor. high-demand, high-complexity operations will be fast on expensive CPUs (like those used in computers), because users are willing to pay for them. You're probably not willing to pay an extra $3 for your toaster to have a fast FP multiply, however, so cheap CPUs will skimp on these instructions. low-demand, high-complexity operations will generally be slow on nearly all processors; there just isn't enough benefit to justify the cost. low-demand, low-complexity operations will be fast if someone bothers to think about them, and non-existent otherwise. Further reading: <ul> <li>Agner Fog maintains a nice website with lots of discussion of low-level performance details (and has very scientific data collection methodology to back it up).</li> <li> The Intel® 64 and IA-32 Architectures Optimization Reference Manual (PDF download link is part way down the page) covers a lot of these issues as well, though it is focused on one specific family of architectures.</li> </ul>

Performance of built-in types : char vs short vs int vs. float vs. double

1 Answers

Float vs. integer:

Historically, floating-point could be much slower than integer arithmetic. On modern computers, this is no longer really the case (it is somewhat slower on some platforms, but unless you write perfect code and optimize for every cycle, the difference will be swamped by the other inefficiencies in your code).

On somewhat limited processors, like those in high-end cell phones, floating-point may be somewhat slower than integer, but it's generally within an order of magnitude (or better), so long as there is hardware floating-point available. It's worth noting that this gap is closing pretty rapidly as cell phones are called on to run more and more general computing workloads.

On very limited processors (cheap cell phones and your toaster), there is generally no floating-point hardware, so floating-point operations need to be emulated in software. This is slow -- a couple orders of magnitude slower than integer arithmetic.

As I said though, people are expecting their phones and other devices to behave more and more like "real computers", and hardware designers are rapidly beefing up FPUs to meet that demand. Unless you're chasing every last cycle, or you're writing code for very limited CPUs that have little or no floating-point support, the performance distinction doesn't matter to you.

Different size integer types:

Typically, CPUs are fastest at operating on integers of their native word size (with some caveats about 64-bit systems). 32 bit operations are often faster than 8- or 16- bit operations on modern CPUs, but this varies quite a bit between architectures. Also, remember that you can't consider the speed of a CPU in isolation; it's part of a complex system. Even if operating on 16-bit numbers is 2x slower than operating on 32-bit numbers, you can fit twice as much data into the cache hierarchy when you represent it with 16-bit numbers instead of 32-bits. If that makes the difference between having all your data come from cache instead of taking frequent cache misses, then the faster memory access will trump the slower operation of the CPU.

Other notes:

Vectorization tips the balance further in favor of narrower types (float and 8- and 16-bit integers) -- you can do more operations in a vector of the same width. However, good vector code is hard to write, so it's not as though you get this benefit without a lot of careful work.

Why are there performance differences?

There are really only two factors that effect whether or not an operation is fast on a CPU: the circuit complexity of the operation, and user demand for the operation to be fast.

(Within reason) any operation can be made fast, if the chip designers are willing to throw enough transistors at the problem. But transistors cost money (or rather, using lots of transistors makes your chip larger, which means you get fewer chips per wafer and lower yields, which costs money), so chip designers have to balance how much complexity to use for which operations, and they do this based on (perceived) user demand. Roughly, you might think of breaking operations into four categories:

                 high demand            low demand high complexity  FP add, multiply       division low complexity   integer add            popcount, hcf                  boolean ops, shifts

high-demand, low-complexity operations will be fast on nearly any CPU: they're the low-hanging fruit, and confer maximum user benefit per transistor.

high-demand, high-complexity operations will be fast on expensive CPUs (like those used in computers), because users are willing to pay for them. You're probably not willing to pay an extra $3 for your toaster to have a fast FP multiply, however, so cheap CPUs will skimp on these instructions.

low-demand, high-complexity operations will generally be slow on nearly all processors; there just isn't enough benefit to justify the cost.

low-demand, low-complexity operations will be fast if someone bothers to think about them, and non-existent otherwise.

Further reading:

Agner Fog maintains a nice website with lots of discussion of low-level performance details (and has very scientific data collection methodology to back it up).
The Intel® 64 and IA-32 Architectures Optimization Reference Manual (PDF download link is part way down the page) covers a lot of these issues as well, though it is focused on one specific family of architectures.

123

answered Sep 29 '22 14:09

Stephen Canon

Related questions
                            
                                C++ view types: pass by const& or by value?
                            
                                C++17: Keep only some members when tuple unpacking
                            
                                How do I decide whether to use ATL, MFC, Win32 or CLR for a new C++ project?
                            
                                A lambda's return type can be deduced by the return value, so why can't a function's?
                            
                                Why aren't my include guards preventing recursive inclusion and multiple symbol definitions?
                            
                                do I need to close a std::fstream? [duplicate]
                            
                                Why would the behavior of std::memcpy be undefined for objects that are not TriviallyCopyable?
                            
                                How do you find what version of libstdc++ library is installed on your linux machine?
                            
                                What does the g stand for in gcount, tellg and seekg?
                            
                                Why do I get an error trying to call a template member function with an explicit type parameter?
                            
                                What does the "lock" instruction mean in x86 assembly?
                            
                                int operators != and == when comparing to zero
                            
                                using a class defined in a c++ dll in c# code
                            
                                C++ std::set update is tedious: I can't change an element in place
                            
                                why does long long 2147483647 + 1 = -2147483648? [duplicate]
                            
                                Correct way of looping through C++ arrays
                            
                                Generate SHA hash in C++ using OpenSSL library
                            
                                error: cast from 'void*' to 'int' loses precision
                            
                                Creating a simple configuration file and parser in C++
                            
                                C++: Life span of temporary arguments?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Performance of built-in types : char vs short vs int vs. float vs. double

Tags:

c++

performance

c

built-in

Nawaz

People also ask

1 Answers

Stephen Canon

Recent Activity

Donate For Us