Higher part of multiply and division in C or C++?

Tags:

When I multiply a pair of 4 bytes integers in assembly, the lower part of the result is in EAX and the higher part in EDX. If I am in C or C++ and I want to get the higher part, is it possible whithout use of inline assembly?

Is in the same way possible to get the integer division result from EAX and the modulus result from EDX without repeating the division in C or C++? I actually only know to do first a/b and then a%b, while in assembler both results are given in the same operation.

260

asked Nov 30 '15 22:11

Juanito Perez

2 Answers

You can do it easily in C this way:

#include <stdint.h>

uint32_t a, b;  // input
uint64_t val = (uint64_t)a * b;
uint32_t high = val >> 32, low = val;

Leave it to the compiler to produce the best possible code. Modern optimizers are really good at it. Hand coded assembly often looks better but performs worse.

As commented by Pete Becker, the above relies on availability of the types uint32_t and uint64_t. If you insist on die hard portability (say you are programming on a DS9K), you may instead use the types uint_least32_t and uint_least64_t or uint_fast32_t and uint_fast64_t that are always available under C99, but you need an extra mask, that will be optimized out if not required:

#include <stdint.h>

uint_fast32_t a, b;  // input
uint_fast64_t val = (uint_fast64_t)a * b;
uint_fast32_t high = (val >> 32) & 0xFFFFFFFF, low = val & 0xFFFFFFFF;

Regarding division, you can use the C99 library functions div, ldiv or lldiv to perform signed division and remainder operations in one call. The division/modulo combination will be implemented in one operation if possible on the target architecture for the specific operand types.

It may be more efficient to write both expressions and rely on the compiler to detect the pattern and produce code that uses a single IDIV opcode:

struct divmod_t { int quo, rem; };
struct divmod_t divmod(int num, int denom) {
    struct divmod_t r = { num / denom, num % denom };
    return r;
}

Testing on Matt Godbolt's compiler explorer shows both clang and gcc generate a single idiv instruction for this code at -O3.

You can turn one of these divisions into a multiplication:

struct divmod_t { int quo, rem; };
struct divmod_t divmod2(int num, int denom) {
    struct divmod_t r;
    r.quo = num / denom;
    r.rem = num - r.quo * denom;
    return r;
}

Note that the above functions do not check for potential overflow, which results in undefined behavior. Overflow occurs if denom = 0 and if num = INT_MIN and denom = -1.

answered Sep 25 '22 18:09

chqrlie

You don't deal with the implementation details in C or C++. That's the whole point. If you want the the most significant bytes, simple use the language. Right shift >> is designed to do that. Something like:

uint64_t i;
uint32_t a;
uint32_t b;
// input a, b and set i to a * b
// this should be done with (thanks to @nnn, pls see comment below):
// i = a; i *= b;
uint64_t msb = i >> 32;

answered Sep 26 '22 18:09

Paul Evans

Related questions
                            
                                Compile error 'nullptr' undeclared identifier
                            
                                c++ enum class integer not working for array subscript
                            
                                static const member variable initialization
                            
                                C++: Why decltype (*this) returns a reference?
                            
                                Why specializing a type_trait could result in undefined behaviour?
                            
                                Are sockets integrated into the C++ standard?
                            
                                C vs C++ file handling
                            
                                clang & gcc don't warn about non-virtual base destructors for polymorphism when using smart pointers?
                            
                                How to hide some columns in a QTreeView? [duplicate]
                            
                                Prioritizing class specializations
                            
                                QML TableView with dynamic number of columns
                            
                                Multiple CUDA contexts for one device - any sense?
                            
                                Opencv getGaborKernel parameters for filter bank
                            
                                In C++, which is better i>-1 or i>=0 [duplicate]
                            
                                Explicit specialization of member function template in source file
                            
                                "end" cannot use in template function
                            
                                Difference between add_lvalue_reference_t<T> and T&
                            
                                Shouldn't the temporary A(3) be destroyed before "Here" is printed?
                            
                                Equivalent of python map function using lambda
                            
                                Does returning a temporary object create a temporary object in C++?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With