Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the differences between MSIL and LLVM bitcode?

Tags:

llvm

cil

I'm new to .Net and I'm trying to understand the basics first. What is the difference between MSIL and LLVM bitcode?

like image 844
Blazej SLEBODA Avatar asked Dec 03 '17 19:12

Blazej SLEBODA


1 Answers

Both LLVM bitcode and MSIL are intermediate languages. Essentially, they are generic assembly code languages: not as high-level as most source languages (e.g., Swift, C#) but also not as low-level as real assembly (e.g., ARM, x86). There are a number of technical implementation differences between the two languages, but most developers don't need to know the small stuff*. They just need to how they are used in their respective platforms' distribution models.


The LLVM bitcode format is a serialized version of the intermediate representation code used within the LLVM compiler. The "front end" of the compiler translates the source language (such as Swift) into LLVM bitcode, and then the "back end" of the compiler translates the bitcode into the target instruction set (such as ARM machine code). (Note: A previous version of this answer implied LLVM bitcode was processor-agnostic. That is not the case, because the source languages depend on the target processor.)

Apple allows iOS developers to submit their apps as either fully-compiled ARM code or as LLVM bitcode, the latter of which:

[...] will allow Apple to re-optimize your app binary in the future without the need to submit a new version of your app to the store.

Essentially, you run the LLVM front end on your development environment, pass the bitcode to Apple, who runs the LLVM back end on their servers. This process is known as ahead-of-time (AOT) compilation (the Wikipedia article is of two minds as to whether the non-bitcode case is also AOT or if that's just "standard" compilation).

But whether or not you use bitcode, iOS end users always get the app as ARM machine code.


Things are a bit different in .NET. Most .NET code is compiled to MSIL, which is packaged in files called assemblies. The .NET runtime on an end user's device loads and executes assemblies, compiling the MSIL to machine code for the device's processor at runtime. This is called just-in-time (JIT) compilation.

Normally, MSIL is processor-agnostic, so most developers can think of .NET apps as also being processor-agnostic. However, there are a number of ways that processor-specific code can be packaged before the end user runs the app through the JIT:

  1. Some tools, like the Native Image Generator and .NET Native, allow AOT compilation. In fact, Universal Windows Platform (UWP) apps uploaded to the Microsoft Store are AOT compiled - you submit the MSIL version of your app to Microsoft, then their servers use .NET Native to compile it for the various architectures Windows 10 supports.

  2. It's also possible to include native code with assemblies themselves; these are called mixed assemblies.

  3. MSIL itself can be processor-specific, if the source language uses "unsafe" operations (e.g., pointer math in C#).

But these are typically the exception, rather than the rule. Usually, .NET apps are distributed in MSIL, and end users' devices are where the native code is generated.


So in summary:

  • LLVM bitcode is processor-specific, but not quite as low-level as actual machine code. Apple allows iOS developers to submit apps as bitcode, to allow for future re-compilations when optimizations can be introduced. The end user runs native executables.

  • MSIL is usually processor-agnostic. The end user typically runs this processor-agnostic code, with .NET compiling the MSIL to native code at runtime. However, there are some cases where some or all of the app could be native code.


* Of course, if you are interested in the technical details, there are standards for LLVM bitcode and for MSIL, under its ECMA name CIL. I'm moderately knowledgeable in the latter; after a cursory glance of the former, the most notable technical difference is the memory models: LLVM bitcode is register-based, MSIL/CIL uses an evaluation stack.

like image 113
Joe Sewell Avatar answered Sep 30 '22 18:09

Joe Sewell