What's the proper way of calling a Win32/64 function from LLVM?

Question

I'm attempting to call a method from LLVM IR back to C++ code. I'm working in 64-bit Visual C++, or as LLVM describes it:

Machine CPU:      skylake
Machine info:     x86_64-pc-windows-msvc

For integer types and pointer types my code works fine as-is. However, floating point numbers seem to be handled a bit strange.

Basically the call looks like this:

struct SomeStruct 
{
    static void Breakpoint( return; } // used to set a breakpoint
    static void Set(uint8_t* ptr, double foo) { return foo * 2; }
};

and LLVM IR looks like this:

define i32 @main(i32, i8**) {
varinit:
  // omitted here: initialize %ptr from i8**. 
  %5 = load i8*, i8** %instance0

  // call to some method. This works - I use it to set a breakpoint
  call void @"Helper::Breakpoint"(i8* %5)

  // this call fails:
  call void @"Helper::Set"(i8* %5, double 0xC19EC46965A6494D)
  ret i32 0
}

declare double @"SomeStruct::Callback"(i8*, double)

I figured that the problem is probably in the way the calling conventions work. So I've attempted to make some adjustments to correct for that:

// during initialization of the function
auto function = llvm::Function::Create(functionType, llvm::Function::ExternalLinkage, name, module);
function->setCallingConv(llvm::CallingConv::X86_64_Win64);
...

// during calling of the function
call->setCallingConv(llvm::CallingConv::X86_64_Win64);

Unfortunately no matter what I try, I end up with 'invalid instruction' errors, which this user reports to be an issue with calling conventions: Clang producing executable with illegal instruction . I've tried this with X86-64_Win64, Stdcall, Fastcall and no calling convention specs - all with the same result.

I've read up on https://msdn.microsoft.com/en-us/library/ms235286.aspx in an attempt to figure out what's going on. Then I looked at the assembly output that's supposed to be generated by LLVM (using the targetMachine->addPassesToEmitFile API call) and found:

    movq    (%rdx), %rsi
    movq    %rsi, %rcx
    callq   "Helper2<double>::Breakpoint"
    vmovsd  __real@c19ec46965a6494d(%rip), %xmm1
    movq    %rsi, %rcx
    callq   "Helper2<double>::Set"
    xorl    %eax, %eax
    addq    $32, %rsp
    popq    %rsi

According to MSDN, argument 2 should be in %xmm1 so that also seems correct. However, when checking if everything works in the debugger, Visual Studio reports a lot of question marks (e.g. 'illegal instruction').

Any feedback is appreciated.

The disassembly code:

00000144F2480007 48 B8 B6 48 B8 C8 FA 7F 00 00 mov         rax,7FFAC8B848B6h  
00000144F2480011 48 89 D1             mov         rcx,rdx  
00000144F2480014 48 89 54 24 20       mov         qword ptr [rsp+20h],rdx  
00000144F2480019 FF D0                call        rax  
00000144F248001B 48 B8 C0 48 B8 C8 FA 7F 00 00 mov         rax,7FFAC8B848C0h  
00000144F2480025 48 B9 00 00 47 F2 44 01 00 00 mov         rcx,144F2470000h  
00000144F248002F ??                   ?? ?? 
00000144F2480030 ??                   ?? ?? 
00000144F2480031 FF 08                dec         dword ptr [rax]  
00000144F2480033 10 09                adc         byte ptr [rcx],cl  
00000144F2480035 48 8B 4C 24 20       mov         rcx,qword ptr [rsp+20h]  
00000144F248003A FF D0                call        rax  
00000144F248003C 31 C0                xor         eax,eax  
00000144F248003E 48 83 C4 28          add         rsp,28h  
00000144F2480042 C3                   ret

Some of the information about the memory is missing. Memory view:

0x00000144F248001B 48 b8 c0 48 b8 c8 fa 7f 00 00 48 b9 00 00 47 f2 44 01 00 00 62 f1 ff 08 10 09 48 8b 4c 24 20 ff d0 31 c0 48 83 c4 28 c3 00 00 00 00 00 ...

The question marks that are missing here are: '62 f1 '.

Some code is helpful to see how I get the JIT to compile etc. I'm afraid it's a bit long, but helps to get the idea... and I have no clue how to create a smaller piece of code.

    // Note: FunctionBinderBase basically holds an llvm::Function* object
    // which is bound using the above code and a name.
    llvm::ExecutionEngine* Module::Compile(std::unordered_map<std::string, FunctionBinderBase*>& externalFunctions)
    {
        //          DebugFlag = true;

#if (LLVMDEBUG >= 1)
        this->module->dump();
#endif

        // -- Initialize LLVM compiler: --
        std::string error;

        // Helper function, gets the current machine triplet.
        llvm::Triple triple(MachineContextInfo::Triplet()); 
        const llvm::Target *target = llvm::TargetRegistry::lookupTarget("x86-64", triple, error);
        if (!target)
        {
            throw error.c_str();
        }

        llvm::TargetOptions Options;
        // Options.PrintMachineCode = true;
        // Options.EnableFastISel = true;

        std::unique_ptr<llvm::TargetMachine> targetMachine(
            target->createTargetMachine(MachineContextInfo::Triplet(), MachineContextInfo::CPU(), "", Options, llvm::Reloc::Default, llvm::CodeModel::Default, llvm::CodeGenOpt::Aggressive));

        if (!targetMachine.get())
        {
            throw "Could not allocate target machine!";
        }

        // Create the target machine; set the module data layout to the correct values.
        auto DL = targetMachine->createDataLayout();
        module->setDataLayout(DL);
        module->setTargetTriple(MachineContextInfo::Triplet());

        // Pass manager builder:
        llvm::PassManagerBuilder pmbuilder;
        pmbuilder.OptLevel = 3;
        pmbuilder.BBVectorize = false;
        pmbuilder.SLPVectorize = true;
        pmbuilder.LoopVectorize = true;
        pmbuilder.Inliner = llvm::createFunctionInliningPass(3, 2);
        llvm::TargetLibraryInfoImpl *TLI = new llvm::TargetLibraryInfoImpl(triple);
        pmbuilder.LibraryInfo = TLI;

        // Generate pass managers:

        // 1. Function pass manager:
        llvm::legacy::FunctionPassManager FPM(module.get());
        pmbuilder.populateFunctionPassManager(FPM);

        // 2. Module pass manager:
        llvm::legacy::PassManager PM;
        PM.add(llvm::createTargetTransformInfoWrapperPass(targetMachine->getTargetIRAnalysis()));
        pmbuilder.populateModulePassManager(PM);

        // 3. Execute passes:
        //    - Per-function passes:
        FPM.doInitialization();
        for (llvm::Module::iterator I = module->begin(), E = module->end(); I != E; ++I)
        {
            if (!I->isDeclaration())
            {
                FPM.run(*I);
            }
        }
        FPM.doFinalization();

        //   - Per-module passes:
        PM.run(*module);

        // Fix function pointers; the PM.run will ruin them, this fixes that.
        for (auto it : externalFunctions)
        {
            auto name = it.first;
            auto fcn = module->getFunction(name);
            it.second->function = fcn;
        }

#if (LLVMDEBUG >= 2)
        // -- ASSEMBLER dump code
        // 3. Code generation pass manager:

        llvm::legacy::PassManager CGP;
        CGP.add(llvm::createTargetTransformInfoWrapperPass(targetMachine->getTargetIRAnalysis()));
        pmbuilder.populateModulePassManager(CGP);

        std::string result;
        llvm::raw_string_ostream str(result);
        llvm::buffer_ostream os(str);

        targetMachine->addPassesToEmitFile(CGP, os, llvm::TargetMachine::CodeGenFileType::CGFT_AssemblyFile);

        CGP.run(*module);

        str.flush();

        auto stringref = os.str();
        std::string assembly(stringref.begin(), stringref.end());

        std::cout << "ASM code: " << std::endl << "---------------------" << std::endl << assembly << std::endl << "---------------------" << std::endl;
        // -- end of ASSEMBLER dump code.

        for (auto it : externalFunctions)
        {
            auto name = it.first;
            auto fcn = module->getFunction(name);
            it.second->function = fcn;
        }

#endif

#if (LLVMDEBUG >= 2)
        module->dump(); 
#endif

        // All done, *RUN*.

        llvm::EngineBuilder engineBuilder(std::move(module));
        engineBuilder.setEngineKind(llvm::EngineKind::JIT);
        engineBuilder.setMCPU(MachineContextInfo::CPU());
        engineBuilder.setMArch("x86-64");
        engineBuilder.setUseOrcMCJITReplacement(false);
        engineBuilder.setOptLevel(llvm::CodeGenOpt::None);

        llvm::ExecutionEngine* engine = engineBuilder.create();

        // Define external functions
        for (auto it : externalFunctions)
        {
            auto fcn = it.second;
            if (fcn->function)
            {
                engine->addGlobalMapping(fcn->function, const_cast<void*>(fcn->FunctionPointer())); // Yuck... LLVM only takes non-const pointers
            }
        }

        // Finalize
        engine->finalizeObject();

        return engine;
    }

Update (progress)

Apparently my Skylake has problems with the vmovsd instruction. When running the same code on a Haswell (server), the test succeeds. I've checked the assembly output on both - they are exactly the same.

Just to be sure: XSAVE/XRESTORE shouldn't be the problem on Win10-x64, but let's find out anyways. I've checked the features with the code from https://msdn.microsoft.com/en-us/library/hskdteyh.aspx and the XSAVE/XRESTORE from https://insufficientlycomplicated.wordpress.com/2011/11/07/detecting-intel-advanced-vector-extensions-avx-in-visual-studio/ . The latter runs just fine. As for the former, these are the results:

GenuineIntel
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
3DNOW not supported
3DNOWEXT not supported
ABM not supported
ADX supported
AES supported
AVX supported
AVX2 supported
AVX512CD not supported
AVX512ER not supported
AVX512F not supported
AVX512PF not supported
BMI1 supported
BMI2 supported
CLFSH supported
CMPXCHG16B supported
CX8 supported
ERMS supported
F16C supported
FMA supported
FSGSBASE supported
FXSR supported
HLE supported
INVPCID supported
LAHF supported
LZCNT supported
MMX supported
MMXEXT not supported
MONITOR supported
MOVBE supported
MSR supported
OSXSAVE supported
PCLMULQDQ supported
POPCNT supported
PREFETCHWT1 not supported
RDRAND supported
RDSEED supported
RDTSCP supported
RTM supported
SEP supported
SHA not supported
SSE supported
SSE2 supported
SSE3 supported
SSE4.1 supported
SSE4.2 supported
SSE4a not supported
SSSE3 supported
SYSCALL supported
TBM not supported
XOP not supported
XSAVE supported

It's weird, so I figured: why not simply emit the instruction directly.

int main()
{
    const double value = 1.2;
    const double value2 = 1.3;

    auto x1 = _mm_load_sd(&value);
    auto x2 = _mm_load_sd(&value2);

    std::string s;
    std::getline(std::cin, s);
}

This code runs fine. The disassembly:

    auto x1 = _mm_load_sd(&value);
00007FF7C4833724 C5 FB 10 45 08       vmovsd      xmm0,qword ptr [value]  

    auto x1 = _mm_load_sd(&value);
00007FF7C4833729 C5 F1 57 C9          vxorpd      xmm1,xmm1,xmm1  
00007FF7C483372D C5 F3 10 C0          vmovsd      xmm0,xmm1,xmm0

Apparently it won't use register xmm1, but still proves that the instruction itself does the trick.

atlaste · Accepted Answer

I just checked on another Intel Haswell what's going on here, and found this:

0000015077F20110 C5 FB 10 08          vmovsd      xmm1,qword ptr [rax]

Apparently on Intel Haswell it emits another byte code instruction than on my Skylake.

@Ha. actually was kind enough to point me in the right direction here. Yes, the hidden bytes indeed indicate VMOVSD, but apparently it's encoded as EVEX. That's all nice and well, but EVEX prefix / encoding will be introduced in the latest Skylake architecture as part of AVX512, which won't be supported until Skylake Purley in 2017. In other words, this is an invalid instruction.

To check, I've put a breakpoint in X86MCCodeEmitter::EmitMemModRMByte. At some point, I do see an bool HasEVEX = [...] evaluating to true. This confirms that the codegen / emitter is producing the wrong output.

My conclusion is therefore that this has to be a bug in the target information of LLVM for Skylake CPU's. That means there are only two things remaining to do: figure out where this bug is exactly in LLVM so we can solve this and report the bug to the LLVM team...

So where is it in LLVM? That's tough to tell... x86.td.def defines skylake features as 'FeatureAVX512' which will probably trigger X86SSELevel to AVX512F. That in turn will give the wrong instructions. As a workaround, it's best to simply tell LLVM that we have an Intel Haswell instead and all will be well:

// MCPU is used to call createTargetMachine
llvm::StringRef MCPU = llvm::sys::getHostCPUName();
if (MCPU.str() == "skylake")
{
    MCPU = llvm::StringRef("haswell");
}

Test, works.

What's the proper way of calling a Win32/64 function from LLVM?

Tags:

c++11

llvm

calling-convention

llvm-ir

atlaste

1 Answers

atlaste

Recent Activity

Donate For Us

What's the proper way of calling a Win32/64 function from LLVM?

Tags:

c++11

llvm

calling-convention

llvm-ir

atlaste

1 Answers

atlaste

Related questions

Recent Activity

Donate For Us