How expensive is it to convert between int and double?

Tags:

I often see code that converts ints to doubles to ints to doubles and back once again (sometimes for good reasons, sometimes not), and it just occurred to me that this seems like a "hidden" cost in my program. Let's assume the conversion method is truncation.

So, just how expensive is it? I'm sure it varies depending on hardware, so let's assume a newish Intel processor (Haswell, if you like, though I'll take anything). Some metrics I'd be interested in (though a good answer needn't have all of them):

# of generated instructions
# of cycles used
Relative cost compared to basic arithmetic operations

I would also assume that the way we would most acutely experience the impact of a slow conversion would be with respect to power usage rather than execution speed, given the difference in how many computations we can perform each second relative to how much data can actually arrive at the CPU each second.

599

asked Feb 23 '15 06:02

Mark

1 Answers

Here's what I could dig up myself, for x86-64 doing FP math with SSE2 (not legacy x87 where changing the rounding mode for C++'s truncation semantics was expensive):

When I take a look at the generated assembly from clang and gcc, it looks like the cast int to double, it boils down to one instruction: cvttsd2si.

From double to int it's cvtsi2sd. (cvtsi2sdl AT&T syntax for cvtsi2sd with 32-bit operand-size.)

With auto-vectorization, we get cvtdq2pd.

So I suppose the question becomes: what is the cost of those?
These instructions each cost approximately the same as an FP addsd plus a movq xmm, r64 (fp <- integer) or movq r64, xmm (integer <- fp), because they decode to 2 uops which on the same ports, on mainstream (Sandybridge/Haswell/Sklake) Intel CPUs.

The Intel® 64 and IA-32 Architectures Optimization Reference Manual says that cost of the cvttsd2si instruction is 5 latency (see Appendix C-16). cvtsi2sd, depending on your architecture, has latency varying from 1 on Silvermont to more like 7-16 on several other architectures.

Agner Fog's instruction tables have more accurate/sensible numbers, like 5-cycle latency for cvtsi2sd on Silvermont (with 1 per 2 clock throughput), or 4c latency on Haswell, with one per clock throughput (if you avoid the dependency on the destination register from merging with the old upper half, like gcc usually does with pxor xmm0,xmm0).

SIMD packed-float to packed-int is great; single uop. But converting to double requires a shuffle to change element size. SIMD float/double<->int64_t doesn't exist until AVX512, but can be done manually with limited range.

Intel's manual defines latency as: "The number of clock cycles that are required for the execution core to complete the execution of all of the μops that form an instruction." But a more useful definition is the number of clocks from an input being ready until the output becomes ready. Throughput is more important than latency if there's enough parallelism for out-of-order execution to do its job: What considerations go into predicting latency for operations on modern superscalar processors and how can I calculate them by hand?.
The same Intel manual says that an integer add instruction costs 1 latency and an integer imul costs 3 (Appendix C-27). FP addsd and mulsd run at 2 per clock throughput, with 4 cycle latency, on Skylake. Same for the SIMD versions, and for FMA, with 128 or 256-bit vectors.

On Haswell, addsd / addpd is only 1 per clock throughput, but 3 cycle latency thanks to a dedicated FP-add unit.

So, the answer boils down to:

1) It's hardware optimized, and the compiler leverages the hardware machinery.

2) It costs only a bit more than a multiply does in terms of the # of cycles in one direction, and a highly variable amount in the other (depending on your architecture). Its cost is neither free nor absurd, but probably warrants more attention given how easy it is write code that incurs the cost in a non-obvious way.

answered Sep 23 '22 02:09

Mark

Related questions
                            
                                When should std::nothrow be used?
                            
                                Why does initializing an extern variable inside a function give an error?
                            
                                Deep Copy of OpenCV cv::Mat
                            
                                How to portably find out min(INT_MAX, abs(INT_MIN))?
                            
                                Unsigned modulos: alternative approach?
                            
                                using enum says invalid conversion from 'int' to 'type'
                            
                                Why the size of a pointer to a function is different from the size of a pointer to a member function?
                            
                                C++ (14) and manual memory management
                            
                                What is move_iterator for
                            
                                What is the _REENTRANT flag?
                            
                                How to use Microsoft Application Verifier
                            
                                Why does the implementation of std::to_string create a buffer 4 times the size of the type?
                            
                                gcc-8 -Wstringop-truncation what is the good practice?
                            
                                What is going on: C++ std::move on std::shared_ptr increases use_count?
                            
                                JNI Calls different in C vs C++?
                            
                                classes and static variables in shared libraries
                            
                                Simple Linux Signal Handling
                            
                                C++ with gradle
                            
                                abs vs std::abs, what does the reference say?
                            
                                What is noexcept useful for?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How expensive is it to convert between int and double?

Tags:

c++

x86

x86-64

micro-optimization

c++-cli

Mark

People also ask

1 Answers

Mark

Recent Activity

Donate For Us