Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing llvm byte code

Tags:

bytecode

llvm

I have just discovered LLVM and don't know much about it yet. I have been trying it out using llvm in browser. I can see that any C code I write is converted to LLVM byte code which is then converted to native code. The page shows a textual representation of the byte code. For example for the following C code:

int array[] = { 1, 2, 3};

int foo(int X) {
  return array[X];
}

It shows the following byte code:

target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-linux-gnu"

@array = global [3 x i32] [i32 1, i32 2, i32 3]   ; <[3 x i32]*> [#uses=1]

define i32 @foo(i32 %X) nounwind readonly {
entry:
  %0 = sext i32 %X to i64                         ; <i64> [#uses=1]
  %1 = getelementptr inbounds [3 x i32]* @array, i64 0, i64 %0 ; <i32*> [#uses=1]
  %2 = load i32* %1, align 4                      ; <i32> [#uses=1]
   ret i32 %2
}

My question is: Can I write the byte code and give it to the llvm assembler to convert to native code skipping the first step of writing C code altogether? If yes, how do I do it? Does any one have any pointers for me?

like image 719
341008 Avatar asked Mar 24 '11 09:03

341008


People also ask

What is LLVM byte code?

The LLVM bytecode representation is used to store the intermediate representation on disk in compacted form. The LLVM bytecode format may change in the future, but LLVM will always be backwards compatible with older formats. This document will only describe the most current version of the bytecode format.

Is an LLVM bit code file?

What is commonly known as the LLVM bitcode file format (also, sometimes anachronistically known as bytecode) is actually two things: a bitstream container format and an encoding of LLVM IR into the container format. The bitstream format is an abstract encoding of structured data, very similar to XML in some ways.

Is LLVM IR bytecode?

The LLVM IR is the Java bytecode counterpart for the LLVM compiler framework. It can be considered as a low level bytecode intended for binaries generation.

What is LLVM assembly code?

Introduction. The LLVM code representation is designed to be used in three different forms: as an in-memory compiler IR, as an on-disk bitcode representation (suitable for fast loading by a Just-In-Time compiler), and as a human readable assembly language representation.


Video Answer


2 Answers

One very important feature (and design goal) of the LLVM IR language is its 3-way representation:

  • The textual representation you can see here
  • The bytecode representation (or binary form)
  • The in-memory representation

All 3 are indeed completely interchangeable. Nothing that can be expressed in one cannot be expressed in the 2 others as well.

Therefore, as long as you conform to the syntax, you can indeed write the IR yourself. It is rather pointless though, unless used as an exercise to accustom yourself with the format, whether to be better at reading (and diagnosing) the IR or to produce your own compiler :)

like image 115
Matthieu M. Avatar answered Sep 24 '22 03:09

Matthieu M.


Yes, surely you can. First, you can write LLVM IR by hand. All tools like llc (which will generate a native code for you) and opt (LLVM IR => LLVM IR optimizer) accept textual representation of LLVM IR as input.

like image 21
Anton Korobeynikov Avatar answered Sep 24 '22 03:09

Anton Korobeynikov