Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LLVM intrinsic functions

Tags:

llvm

When building a project with LLVM, some function calls will be replaced by intrinsic functions. Is the replacement completed by the front-end (e.g. clang) or the LLVM back-end?

Discussions through the Internet indicate that the intrinsic functions replacement is related to optimization options. So does it mean if there is no optimization option, then no intrinsic replacement will happen? Or in fact, there are some default intrinsic functions replacement that cannot be disabled?

If there is any method to disable all the intrinsic functions, how should I do that?

like image 933
Junxzm Avatar asked Dec 16 '14 18:12

Junxzm


People also ask

What is LLVM Bitcode?

What is commonly known as the LLVM bitcode file format (also, sometimes anachronistically known as bytecode) is actually two things: a bitstream container format and an encoding of LLVM IR into the container format. The bitstream format is an abstract encoding of structured data, very similar to XML in some ways.

How does LLVM IR work?

A developer uses the API to generate instructions in a format called an intermediate representation, or IR. LLVM can then compile the IR into a standalone binary or perform a JIT (just-in-time) compilation on the code to run in the context of another program, such as an interpreter or runtime for the language.

What is LLVM address space?

An address space is a fundamental part of the type of a pointer value and the type of operations that manipulate memory. LLVM affords a default address space (numbered zero) and places a number of assumptions on pointer values within that address space: The pointer must have a fixed integral value.


1 Answers

It depends. Intrinsics written in code are emitted through the front-end directly. Intrinsics like llvm.memset are introduced to the code during optimization at IR level (eigther the front-end nor the back-end perform this optimizations).

Here is a (quite stupid) example:

int main(int argc, char** argv)
{
        int a[8];

        for (int i = 0; i != 8; ++i)
                a[i] = 0;

        for (int i = 7; i >= 0; --i)
                a[i] = a[i+1] + argc;

        return a[0];
}

Compiled with clang 3.5 (clang -S -emit-llvm) you will get the following IR without any intrinsics:

; Function Attrs: nounwind uwtable
define i32 @main(i32 %argc, i8** %argv) #0 {
  %1 = alloca i32, align 4
  %2 = alloca i32, align 4
  %3 = alloca i8**, align 8
  %a = alloca [8 x i32], align 16
  %i = alloca i32, align 4
  %i1 = alloca i32, align 4
  store i32 0, i32* %1
  store i32 %argc, i32* %2, align 4
  store i8** %argv, i8*** %3, align 8
  store i32 0, i32* %i, align 4
  br label %4

; <label>:4                                       ; preds = %11, %0
  %5 = load i32* %i, align 4
  %6 = icmp ne i32 %5, 8
  br i1 %6, label %7, label %14

; <label>:7                                       ; preds = %4
  %8 = load i32* %i, align 4
  %9 = sext i32 %8 to i64
  %10 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 %9
  store i32 0, i32* %10, align 4
  br label %11

; <label>:11                                      ; preds = %7
  %12 = load i32* %i, align 4
  %13 = add nsw i32 %12, 1
  store i32 %13, i32* %i, align 4
  br label %4

; <label>:14                                      ; preds = %4
  store i32 7, i32* %i1, align 4
  br label %15

; <label>:15                                      ; preds = %29, %14
  %16 = load i32* %i1, align 4
  %17 = icmp sge i32 %16, 0
  br i1 %17, label %18, label %32

; <label>:18                                      ; preds = %15
  %19 = load i32* %i1, align 4
  %20 = add nsw i32 %19, 1
  %21 = sext i32 %20 to i64
  %22 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 %21
  %23 = load i32* %22, align 4
  %24 = load i32* %2, align 4
  %25 = add nsw i32 %23, %24
  %26 = load i32* %i1, align 4
  %27 = sext i32 %26 to i64
  %28 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 %27
  store i32 %25, i32* %28, align 4
  br label %29

; <label>:29                                      ; preds = %18
  %30 = load i32* %i1, align 4
  %31 = add nsw i32 %30, -1
  store i32 %31, i32* %i1, align 4
  br label %15

; <label>:32                                      ; preds = %15
  %33 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 0
  %34 = load i32* %33, align 4
  ret i32 %34
}

Compiled again with clang -emit-llvm -O1 you will see this:

; Function Attrs: nounwind readnone uwtable
define i32 @main(i32 %argc, i8** nocapture readnone %argv) #0 {
.preheader:
  %a = alloca [8 x i32], align 16
  %a6 = bitcast [8 x i32]* %a to i8*
  call void @llvm.memset.p0i8.i64(i8* %a6, i8 0, i64 32, i32 4, i1 false)
  br label %0

; <label>:0                                       ; preds = %.preheader, %0
  %indvars.iv = phi i64 [ 7, %.preheader ], [ %indvars.iv.next, %0 ]
  %1 = add nsw i64 %indvars.iv, 1
  %2 = getelementptr inbounds [8 x i32]* %a, i64 0, i64 %1
  %3 = load i32* %2, align 4, !tbaa !1
  %4 = add nsw i32 %3, %argc
  %5 = getelementptr inbounds [8 x i32]* %a, i64 0, i64 %indvars.iv
  store i32 %4, i32* %5, align 4, !tbaa !1
  %indvars.iv.next = add nsw i64 %indvars.iv, -1
  %6 = trunc i64 %indvars.iv to i32
  %7 = icmp sgt i32 %6, 0
  br i1 %7, label %0, label %8

; <label>:8                                       ; preds = %0
  %9 = getelementptr inbounds [8 x i32]* %a, i64 0, i64 0
  %10 = load i32* %9, align 16, !tbaa !1
  ret i32 %10
}

The initialization loop was replaced by the llvm.memset intrinsic. The back-end is free to handle the intrinsic as it want's but commonly llvm.memset is lowered to a libc library call.

To answer your first question: Yes, if you don't optimize your code, then you will not get intrinsics in your IR.

To prevent intrinsics being introduced in your code all you have to do is find the optimization pass on your IR and don't run it. Here is a related question how to find out what passes are done on the IR: Where to find the optimization sequence for clang -OX?

for -O1 we get:

prune-eh -inline-cost -always-inline -functionattrs -sroa -domtree -early-cse -lazy-value-info -jump-threading -correlated-propagation -simplifycfg -instcombine -tailcallelim -simplifycfg -reassociate -domtree -loops -loop-simplify -lcssa -loop-rotate -licm -loop-unswitch -instcombine -scalar-evolution -lcssa -indvars -loop-idiom -loop-deletion -loop-unroll -memdep -memcpyopt -sccp -instcombine -lazy-value-info -jump-threading -correlated-propagation -domtree -memdep -dse -adce -simplifycfg -instcombine -barrier -domtree -loops -loop-simplify -lcssa -branch-prob -block-freq -scalar-evolution -loop-vectorize -instcombine -simplifycfg -strip-dead-prototypes -verify

A wild guess: instcombine is introducing the llvm.memset. I run the passes without instcombine and opt on the unoptimized IR and get this:

; Function Attrs: nounwind readnone uwtable
define i32 @main(i32 %argc, i8** %argv) #0 {
  %a = alloca [8 x i32], align 16
  %1 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 8
  %2 = load i32* %1, align 4
  %3 = add nsw i32 %2, %argc
  %4 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 7
  store i32 %3, i32* %4, align 4
  %5 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 7
  %6 = load i32* %5, align 4
  %7 = add nsw i32 %6, %argc
  %8 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 6
  store i32 %7, i32* %8, align 4
  %9 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 6
  %10 = load i32* %9, align 4
  %11 = add nsw i32 %10, %argc
  %12 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 5
  store i32 %11, i32* %12, align 4
  %13 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 5
  %14 = load i32* %13, align 4
  %15 = add nsw i32 %14, %argc
  %16 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 4
  store i32 %15, i32* %16, align 4
  %17 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 4
  %18 = load i32* %17, align 4
  %19 = add nsw i32 %18, %argc
  %20 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 3
  store i32 %19, i32* %20, align 4
  %21 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 3
  %22 = load i32* %21, align 4
  %23 = add nsw i32 %22, %argc
  %24 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 2
  store i32 %23, i32* %24, align 4
  %25 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 2
  %26 = load i32* %25, align 4
  %27 = add nsw i32 %26, %argc
  %28 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 1
  store i32 %27, i32* %28, align 4
  %29 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 1
  %30 = load i32* %29, align 4
  %31 = add nsw i32 %30, %argc
  %32 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 0
  store i32 %31, i32* %32, align 4
  %33 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 0
  %34 = load i32* %33, align 4
  ret i32 %34
}

No instructions. So to prevent (at least the memset) intrinsics in your code don't run instcombine on your IR. However, instcombine is a mighty opt pass that realy shortens the code.

Now you have two options:

  1. don't use opt passes that introduce intrinsics
  2. write your own llvm opt pass that transforms intrinsics back to whatever they could be replaced with an run it after optimization and before the back-end starts working

I hope this helps you somehow. Cheers!

like image 148
Michael Haidl Avatar answered Oct 30 '22 06:10

Michael Haidl