Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I implement a string data type in LLVM?

Tags:

I have been looking at LLVM lately, and I find it to be quite an interesting architecture. However, looking through the tutorial and the reference material, I can't see any examples of how I might implement a string data type.

There is a lot of documentation about integers, reals, and other number types, and even arrays, functions and structures, but AFAIK nothing about strings. Would I have to add a new data type to the backend? Is there a way to use built-in data types? Any insight would be appreciated.

like image 629
a_m0d Avatar asked Jun 30 '09 04:06

a_m0d


People also ask

What is a string ref?

StringRef - Represent a constant reference to a string, i.e. More... #include "llvm/ADT/StringRef.h"

What is IR in LLVM?

LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes. LLVM.

Does rust use LLVM?

Tradeoff #4: LLVM and poor LLVM IR generationrustc uses LLVM to generate code. LLVM can generate very fast code, but it comes at a cost. LLVM is a very big system. In fact, LLVM code makes up the majority of the Rust codebase.

How does LLVM compiler work?

How A LLVM Compiler Works. On the front end, the LLVM compiler infrastructure uses clang — a compiler for programming languages C, C++ and CUDA — to turn source code into an interim format. Then the LLVM clang code generator on the back end turns the interim format into final machine code.


2 Answers

What is a string? An array of characters.

What is a character? An integer.

So while I'm no LLVM expert by any means, I would guess that if, eg, you wanted to represent some 8-bit character set, you'd use an array of i8 (8-bit integers), or a pointer to i8. And indeed, if we have a simple hello world C program:

#include <stdio.h>  int main() {         puts("Hello, world!");         return 0; } 

And we compile it using llvm-gcc and dump the generated LLVM assembly:

$ llvm-gcc -S -emit-llvm hello.c $ cat hello.s ; ModuleID = 'hello.c' target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128" target triple = "x86_64-linux-gnu" @.str = internal constant [14 x i8] c"Hello, world!\00"         ; <[14 x i8]*> [#uses=1]  define i32 @main() { entry:         %retval = alloca i32            ; <i32*> [#uses=2]         %tmp = alloca i32               ; <i32*> [#uses=2]         %"alloca point" = bitcast i32 0 to i32          ; <i32> [#uses=0]         %tmp1 = getelementptr [14 x i8]* @.str, i32 0, i64 0            ; <i8*> [#uses=1]         %tmp2 = call i32 @puts( i8* %tmp1 ) nounwind            ; <i32> [#uses=0]         store i32 0, i32* %tmp, align 4         %tmp3 = load i32* %tmp, align 4         ; <i32> [#uses=1]         store i32 %tmp3, i32* %retval, align 4         br label %return  return:         ; preds = %entry         %retval4 = load i32* %retval            ; <i32> [#uses=1]         ret i32 %retval4 }  declare i32 @puts(i8*) 

Notice the reference to the puts function declared at the end of the file. In C, puts is

int puts(const char *s) 

In LLVM, it is

i32 @puts(i8*) 

The correspondence should be clear.

As an aside, the generated LLVM is very verbose here because I compiled without optimizations. If you turn those on, the unnecessary instructions disappear:

$ llvm-gcc -O2 -S -emit-llvm hello.c $ cat hello.s  ; ModuleID = 'hello.c' target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128" target triple = "x86_64-linux-gnu" @.str = internal constant [14 x i8] c"Hello, world!\00"         ; <[14 x i8]*> [#uses=1]  define i32 @main() nounwind  { entry:         %tmp2 = tail call i32 @puts( i8* getelementptr ([14 x i8]* @.str, i32 0, i64 0) ) nounwind              ; <i32> [#uses=0]         ret i32 0 }  declare i32 @puts(i8*) 
like image 160
Jason Creighton Avatar answered Sep 29 '22 11:09

Jason Creighton


[To follow up on other answers which explain what strings are, here is some implementation help]

Using the C interface, the calls you'll want are something like:

LLVMValueRef llvmGenLocalStringVar(const char* data, int len) {   LLVMValueRef glob = LLVMAddGlobal(mod, LLVMArrayType(LLVMInt8Type(), len), "string");    // set as internal linkage and constant   LLVMSetLinkage(glob, LLVMInternalLinkage);   LLVMSetGlobalConstant(glob, TRUE);    // Initialize with string:   LLVMSetInitializer(glob, LLVMConstString(data, len, TRUE));    return glob; } 
like image 41
David Gardner Avatar answered Sep 29 '22 12:09

David Gardner