Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I disassemble the result of LLVM MCJIT compilation?

I have a program I wrote which uses LLVM 3.5 as a JIT compiler, which I'm trying to update to use MCJIT in LLVM 3.7. I have it mostly working, but I'm struggling to reproduce one debug-only feature I implemented with LLVM 3.5.

I would like to be able to see the host machine code (e.g. x86, x64 or ARM, not LLVM IR) generated by the JIT process; in debug builds I log this out as my program is running. With LLVM 3.5 I was able to do this by invoking ExecutionEngine::runJITOnFunction() to fill in a llvm::MachineCodeInfo object, which gave me the start address and size of the generated code. I could then disassemble that code.

I can't seem to find any equivalent in MCJIT. I can get the start address of the function (e.g. via getPointerToFunction()) but not the size.

I have seen Disassemble Memory but apart from not having that much detail in the answers, it seems to be more about how to disassemble a sequence of bytes. I know how to do that, my question is: how can I get hold of the sequence of bytes in the first place?

If it would help to make this more concrete, please reinterpret this question as: "How can I extend the example Kaleidoscope JIT to show the machine code (x86, ARM, etc) it produces, not just the LLVM IR?"

Thanks.

like image 644
Steve F Avatar asked Nov 09 '22 23:11

Steve F


1 Answers

You have at least two options here.

  1. Supply your own memory manager. This must be well documented and is done in many projects using MCJIT. But for the sake of completeness here's the code:

    class MCJITMemoryManager : public llvm::RTDyldMemoryManager {
    public:
    static std::unique_ptr<MCJITMemoryManager> Create();
    
    MCJITMemoryManager();
    virtual ~MCJITMemoryManager();
    
    // Allocate a memory block of (at least) the given size suitable for
    // executable code. The section_id is a unique identifier assigned by the
    // MCJIT engine, and optionally recorded by the memory manager to access a
    // loaded section.
    byte* allocateCodeSection(uintptr_t size, unsigned alignment,
                              unsigned section_id,
                              llvm::StringRef section_name) override;
    
    // Allocate a memory block of (at least) the given size suitable for data.
    // The SectionID is a unique identifier assigned by the JIT engine, and
    // optionally recorded by the memory manager to access a loaded section.
    byte* allocateDataSection(uintptr_t size, unsigned alignment,
                        unsigned section_id, llvm::StringRef section_name,
                        bool is_readonly) override;
    ...
    }
    

    Pass a memory manager instance to EngineBuilder:

    std::unique_ptr<MCJITMemoryManager> manager = MCJITMemoryManager::Create();
    llvm::ExecutionEngine* raw = lvm::EngineBuilder(std::move(module))
        .setMCJITMemoryManager(std::move(manager))
        ...
        .create();
    

    Now via these callbacks you have control over the memory where the code gets emitted. (And the size is passed directly to your method). Simply remember the address of the buffer you allocated for code section and, stop the program in gdb and disassemble the memory (or dump it somewhere or even use LLVM's disassembler).

  2. Just use llc on your LLVM IR with appropriate options (optimization level, etc.). As I see it, MCJIT is called so for a reason and that reason is that it reuses the existing code generation modules (same as llc).
like image 90
Vladislav Ivanishin Avatar answered Dec 20 '22 04:12

Vladislav Ivanishin