Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert GCC Inline Assembly CMOV to Visual Studio Assembler

In the article Linear vs. Binary Search, there is a fast implementation of binary search which uses the CMOV instruction. I would like to implement this in VC++ as the application I'm working on relies on the performance of Binary Search.

enter image description here

The implementation has some GCC inline assembler which is declared as follows:

static int binary_cmov (const int *arr, int n, int key) {
    int min = 0, max = n;
    while (min < max) {
            int middle = (min + max) >> 1;

            asm ("cmpl %3, %2\n\tcmovg %4, %0\n\tcmovle %5, %1"
                 : "+r" (min),
                   "+r" (max)
                 : "r" (key), "g" (arr [middle]),
                   "g" (middle + 1), "g" (middle));

            // Equivalent to 
            // if (key > arr [middle])
            //    min = middle + 1;
            // else
            //    max = middle;
    }
    return min;
}

I'd like to convert this GCC Assembler to Microsoft Visual Studio compatible assembler, but as a GCC noob wouldn't know where to start.

Can anyone help, or at least explain the GCC Inline Assembler? TIA!

like image 642
Dr. Andrew Burnett-Thompson Avatar asked Dec 19 '22 16:12

Dr. Andrew Burnett-Thompson


2 Answers

refactoring the code in order to better express intent to the compiler:

int binary_cmov (const int *arr, int n, int key) 
{
    int min = 0, max = n;
    while (min < max) 
    {
      int middle = (min + max) >> 1;
      min = key > arr[middle] ? middle + 1 : min;
      max = key > arr[middle] ? max : middle;
    }
    return min;
}

With gcc5.3 -O3, yields:

binary_cmov(int const*, int, int):
        xorl    %eax, %eax
        testl   %esi, %esi
        jle     .L4
.L3:
        leal    (%rax,%rsi), %ecx
        sarl    %ecx
        movslq  %ecx, %r8
        leal    1(%rcx), %r9d
        movl    (%rdi,%r8,4), %r8d
        cmpl    %edx, %r8d
        cmovl   %r9d, %eax
        cmovge  %ecx, %esi
        cmpl    %eax, %esi
        jg      .L3
        rep ret
.L4:
        rep ret

moral of the story - don't embed assembler. All you do is make the code non-portable.

going further...

why not express intent even more explicitly?

#include <utility>

template<class...Ts>
  auto sum(Ts...ts)
{
  std::common_type_t<Ts...> result = 0;
  using expand = int[];
  void(expand{ 0, ((result += ts), 0)... });
  return result;
}

template<class...Ts>
  auto average(Ts...ts)
{
  return sum(ts...) / sizeof...(Ts);
}

int binary_cmov (const int *arr, int n, int key) 
{
    int min = 0, max = n;
    while (min < max) 
    {
      int middle = average(min, max);
      auto greater = key > arr[middle];
      min = greater ? middle + 1 : min;
      max = greater ? max : middle;
    }
    return min;
}

compiler output:

binary_cmov(int const*, int, int):
        xorl    %eax, %eax
        testl   %esi, %esi
        jle     .L4
.L3:
        leal    (%rax,%rsi), %ecx
        movslq  %ecx, %rcx
        shrq    %rcx
        movslq  %ecx, %r8
        leal    1(%rcx), %r9d
        movl    (%rdi,%r8,4), %r8d
        cmpl    %edx, %r8d
        cmovl   %r9d, %eax
        cmovge  %ecx, %esi
        cmpl    %eax, %esi
        jg      .L3
        rep ret
.L4:
        rep ret
like image 110
Richard Hodges Avatar answered Dec 24 '22 01:12

Richard Hodges


Making a small change to Richard Hodges' code also allows Visual Studio 2013 to use CMOV instruction:

int binary_cmov (const int *arr, int n, int key) 
{
    int min = 0, max = n;
    while (min < max) 
    {
      int middle = (min + max) >> 1;
      int middle1 = middle + 1;
      min = key > arr[middle] ? middle1 : min;
      max = key > arr[middle] ? max : middle;
    }
    return min;
}

Compiling with cl /Ox /Fa /c t286.c generates:

; Listing generated by Microsoft (R) Optimizing Compiler Version 18.00.40629.0 

binary_cmov PROC
    mov QWORD PTR [rsp+8], rbx
; Line 3
    xor eax, eax
    mov ebx, r8d
    mov r11d, edx
; Line 4
    test    edx, edx
    jle SHORT $LN9@binary_cmo
$LL2@binary_cmo:
; Line 6
    lea r10d, DWORD PTR [r11+rax]
    sar r10d, 1
; Line 8
    movsxd  r8, r10d
    lea r9d, DWORD PTR [r10+1]
    cmp ebx, DWORD PTR [rcx+r8*4]
; Line 9
    cmovg   r10d, r11d
    cmovg   eax, r9d
    mov r11d, r10d
    cmp eax, r10d
    jl  SHORT $LL2@binary_cmo
$LN9@binary_cmo:
; Line 12
    mov rbx, QWORD PTR [rsp+8]
    ret 0
binary_cmov ENDP

This also works with the Visual Studio 2015 and 2010 compilers. With Visual Studio the trick seems to be to use the ternary operators and use simple variables as the second and third operands. If you replace middle1 with middle + 1 in the first ternary operator above then Visual Studio only generates one CMOV instruction for this function. The first ternary operator generates a branch instead.

As mentioned by Yakk in a comment, the forthcoming Visual Stdio 2015 Update 3 compiler contains a major update to the optimizer that should change this behaviour, making it more likely to generate CMOV instructions where appropriate.

like image 22
Ross Ridge Avatar answered Dec 24 '22 03:12

Ross Ridge