Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Re-generating source code from LLVM parse tree?

I'm curious if there are any projects out there that can take an LLVM parse tree and re-generate source code off of it. I'm particularly thinking of C/C++.

like image 631
gct Avatar asked Apr 25 '14 15:04

gct


People also ask

What is LLVM codegen?

The LLVM target-independent code generator is a framework that provides a suite of reusable components for translating the LLVM internal representation to the machine code for a specified target—either in assembly form (suitable for a static compiler) or in binary machine code format (usable for a JIT compiler).

Does LLVM compile to machine code?

LLVM is a language-agnostic compiler toolchain that handles program optimization and code generation. It is based on its own internal representation, called LLVM IR, which is then transformed into machine code.

What is AST in LLVM?

The Abstract Syntax Tree (AST)

Why is LLVM so big?

An LLVM-only build will need about 1-3 GB of space. A full build of LLVM and Clang will need around 15-20 GB of disk space. The exact space requirements will vary by system. (It is so large because of all the debugging information and the fact that the libraries are statically linked into multiple tools).


1 Answers

If "LLVM parse tree" is AST from clang

Yes, you can regenerate source from clang's AST. Some references:

  • Basic source-to-source transformation with Clang by Eli, 2012
  • Modern source-to-source transformation with Clang and libTooling by Eli, 2014
  • Performing Source-to-Source Transformations with Clang (Slides)
  • SoSlang: SOurce-to-Source Clang (Slides)

If "LLVM parse tree" is LLVM IR

There were several projects to generate sources from LLVM IR. The first one, the "C back-end" was dropped in LLVM 3.1.

Now there are several projects to generate C from LLVM IR:

  • Resurrected "C back-end" by Roel Jordans

    [LLVMdev] [RFC] Resurrecting the C back-end (Mailing List Post), via cited phoronix news

  • "C++ -> LLVM IR -> Emscripten -> asm.js -> C" chain

    Prototype of an LLVM IR => C compiler ("c backend"), via LLVM Weekly - #15

like image 85
osgx Avatar answered Sep 19 '22 13:09

osgx