Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the size of a program using LLVM/CLANG for a custom bytecode VM?

I'm evaluating different possibilities for a custom VM, and I left out LLVM from another question. Since I'm still working on the evaluation of embedded language VMs I can't test/check this myself for now. I would like to know the following information about LLVM/CLANG:

  • Is it supported well on Windows? Or Solaris? (cross-platform is a plus for me)
  • If I want to write my own/custom VM with a C-like language, what would I need to include in the project? (LLVM/CLANG sections or components, etc)
  • I would keep the compiler separate from the VM for obvious reasons (not writing an interpreter). What would be the size of the required components? Could I build them 'in' the program instead of dynamically linking to them?
  • Can I avoid JIT? I would like to have a bytecode VM which does not necessarily translate to native code. This would help when JIT is not supported on the platform (ex. systems with restrictive memory permissions that do not allow wx/rwx memory mappings).

I know the clamav antivirus for instance uses bytecode backed by LLVM/clang to support dynamic/runtime signatures. However I do not know if there is an existent facility to implement this and/or tutorials or documentation which guides you through the process of implementing such a thing.

Thanks! :)

like image 683
soze Avatar asked Mar 12 '11 04:03

soze


1 Answers

Clang is a parser for C-like languages including C++. If your language is C-like enough (ie., Java is not) then you could add support for your language to clang, which knows how to produce LLVM IR.

LLVM does not require JIT, and is normally statically linked. LLVM provides libraries that perform optimization and code generation of LLVM IR. To JIT is just to generate code to memory instead of on disk. The ordinary usage of Clang+LLVM is as a drop-in replacement for GCC, generating code to .o files.

How big it will be depends on what you need. Do you want all the optimizations? Do you want all the targets (unlike GCC, LLVM is can be built with as many backends in one binary as you want). Since you mentioned embedded, one example is Android using LLVM on cell phones: http://android-developers.blogspot.com/2011/02/introducing-renderscript.html

Windows is supported rather well, you can build LLVM with MSVC++ using our CMake build system, or mingw32. Solaris support is more iffy, we periodically used to get patches to fix it up, but I haven't seen any for a while.

Finally, you may want to read the tutorial at http://llvm.org/docs/tutorial . This chronicles the construction of a JITted REPL language, but the basis is the same for a statically compiled language. Instead of using an llvm::JIT object, you call Target.addPassesToEmitFile and hand it the output stream to write to. See llvm/tools/llc/llc.cpp for a fully worked example (it's lengthy; only a small fraction of that is needed if you don't want to support all the options that llc does).

like image 146
Nick Lewycky Avatar answered Oct 12 '22 09:10

Nick Lewycky