I have been checking out integer-gmp source code to understand how foreign primops can be implemented in terms of cmm as documented on GHC Primops page. I am aware of techniques to implement them using llvm hack or fvia-C/gcc - this is more of a learning experience for me to understand this third approach that interger-gmp library uses.
So, I looked up CMM tutorial on MSFT page (pdf link), went through GHC CMM page, and still there are some unanswered questions (hard to keep all those concepts in head without digging into CMM which is what I am doing now). There is this code fragment from integer-bmp cmm file:
integer_cmm_int2Integerzh (W_ val) { W_ s, p; /* to avoid aliasing */ ALLOC_PRIM_N (SIZEOF_StgArrWords + WDS(1), integer_cmm_int2Integerzh, val); p = Hp - SIZEOF_StgArrWords; SET_HDR(p, stg_ARR_WORDS_info, CCCS); StgArrWords_bytes(p) = SIZEOF_W; /* mpz_set_si is inlined here, makes things simpler */ if (%lt(val,0)) { s = -1; Hp(0) = -val; } else { if (%gt(val,0)) { s = 1; Hp(0) = val; } else { s = 0; } } /* returns (# size :: Int#, data :: ByteArray# #) */ return (s,p); }
As defined in ghc cmm header:
W_ is alias for word. ALLOC_PRIM_N is a function for allocating memory on the heap for primitive object. Sp(n) and Hp(n) are defined as below (comments are mine): #define WDS(n) ((n)*SIZEOF_W) //WDS(n) calculates n*sizeof(Word) #define Sp(n) W_[Sp + WDS(n)]//Sp(n) points to Stackpointer + n word offset? #define Hp(n) W_[Hp + WDS(n)]//Hp(n) points to Heap pointer + n word offset?
I don't understand lines 5-9 (line 1 is the start in case you have 1/0 confusion). More specifically:
The function as I understand it (from looking at function signature in Prim.hs) takes an int, and returns a (int, byte array) (stored in s
,p
respectively in the code).
For anyone who is wondering about inline call in if block
, it is cmm implementation of gmp mpz_init_si function. My guess is if you call a function defined in object file through ccall, it can't be inlined (which makes sense since it is object-code, not intermediate code - LLVM approach seems more suitable for inlining through LLVM IR). So, the optimization was to define a cmm representation of the function to be inlined. Please correct me if this guess is wrong.
Explanation of lines 5-9 will be very much appreciated. I have more questions about other macros defined in integer-gmp file, but it might be too much to ask in one post. If you can answer the question with a Haskell wiki page or a blog (you can post the link as answer), that would be much appreciated (and if you do, I would also appreciate step-by-step walk-through of an integer-gmp cmm macro such as GMP_TAKE2_RET1
).
Those lines allocate a new ByteArray# on the Haskell heap, so to understand them you first need to know a bit about how GHC's heap is managed.
Each capability (= OS thread that executes Haskell code) has its own dedicated nursery, an area of the heap into which it makes normal, small allocations like this one. Objects are simply allocated sequentially into this area from low addresses to high addresses until the capability tries to make an allocation which exceeds the remaining space in the nursery, which triggers the garbage collector.
All heap objects are aligned to a multiple of the word size, i.e., 4 bytes on 32-bit systems and 8 bytes on 64-bit systems.
The Cmm-level register Hp points to (the beginning of) the last word which has been allocated in the nursery. HpLim points to the last word which can be allocated in the nursery. (HpLim can also be set to 0 by another thread to stop the world for GC, or to send an asynchronous exception.)
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObjects has information on the layout of individual heap objects. Notably each heap object begins with an info pointer, which (among other things) identifies what sort of heap object it is.
The Haskell type ByteArray# is implemented with the heap object type ARR_WORDS. An ARR_WORDS object just consists of (an info pointer followed by) a size (in bytes) followed by arbitrary data (the payload). The payload is not interpreted by the GC, so it can't store pointers to Haskell heap objects, but it can store anything else. SIZEOF_StgArrWords is the size of the header common to all ARR_WORDS heap objects, and in this case the payload is just a single word, so SIZEOF_StgArrWords + WDS(1) is the amount of space we need to allocate.
ALLOC_PRIM_N (SIZEOF_StgArrWords + WDS(1), integer_cmm_int2Integerzh, val) expands to something like
Hp = Hp + (SIZEOF_StgArrWords + WDS(1)); if (Hp > HpLim) { HpAlloc = SIZEOF_StgArrWords + WDS(1); goto stg_gc_prim_n(integer_cmm_int2Integerzh, val); }
First line increases Hp by the amount to be allocated. Second line checks for heap overflow. Third line records the amount that we tried to allocate, so the GC can undo it. The fourth line calls the GC.
The fourth line is the most interesting. The arguments tell the GC how to restart the thread once garbage collection is done: it should reinvoke integer_cmm_int2Integerzh with argument val. The "_n" in stg_gc_prim_n (and the "_N" in ALLOC_PRIM_N) means that val is a non-pointer argument (in this case an Int#). If val were a pointer to a Haskell heap object, the GC needs to know that it is live (so it doesn't get collected) and to reinvoke our function with the new address of the object. In that case we'd use the _p variant. There are also variants like _pp for multiple pointer arguments, _d for Double# arguments, etc.
After line 5, we've successfully allocated a block of SIZEOF_StgArrWords + WDS(1) bytes and, remember, Hp points to its last word. So, p = Hp - SIZEOF_StgArrWords sets p to the beginning of this block. Lines 8 fills in the info pointer of p, identifying the newly-created heap object as ARR_WORDS. CCCS is the current cost-center stack, used only for profiling. When profiling is enabled each heap object contains an extra field that basically identifies who is responsible for its allocation. In non-profiling builds, there is no CCCS and SET_HDR just sets the info pointer. Finally, line 9 fills in the size field of the ByteArray#. The rest of the function fills in the payload and return the sign value and the ByteArray# object pointer.
So, this ended up being more about the GHC heap than about the Cmm language, but I hope it helps.
In order to do arithmetic and logical operations computers have digital circuit called ALU (Arithmetic Logic Unit) in their CPU (Central Processing Unit). An ALU loads data from input registers. Processor register is memory storage in L1 cache (data requests within 3 CPU clock ticks) implemented in SRAM(Static Random-Access Memory) located in CPU chip. A processor often contains several kinds of registers, usually differentiated by the number of bits they can hold.
Numbers are expressed in discrete bits can hold finite number of values. Typically numbers have following primitive types exposed by the programming language (in Haskell):
8 bit numbers = 256 unique representable values 16 bit numbers = 65 536 unique representable values 32 bit numbers = 4 294 967 296 unique representable values 64 bit numbers = 18 446 744 073 709 551 616 unique representable values
Fixed-precision arithmetic for those types has been implemented in hardware. Word size refers to the number of bits that can be processed by a computer's CPU in one go. For x86 architecture this is 32 bits and x64 this is 64 bits.
IEEE 754 defines floating point numbers standard for {16, 32, 64, 128} bit numbers. For example 32 bit point number (with 4 294 967 296 unique values) can hold approximate values [-3.402823e38 to 3.402823e38] with accuracy of at least 7 floating point digits.
Acronym GMP means GNU Multiple Precision Arithmetic Library and adds support for software emulated arbitrary-precision arithmetic's. Glasgow Haskell Compiler Integer implementation uses this.
GMP aims to be faster than any other bignum library for all operand sizes. Some important factors in doing this are:
- Using full words as the basic arithmetic type.
- Using different algorithms for different operand sizes; algorithms that are faster for very big numbers are usually slower for small numbers.
- Highly optimized assembly language code for the most important inner loops, specialized for different processors.
For some Haskell might have slightly hard to comprehend syntax so here is javascript version
var integer_cmm_int2Integerzh = function(word) { return WORDSIZE == 32 ? goog.math.Integer.fromInt(word)) : goog.math.Integer.fromBits([word.getLowBits(), word.getHighBits()]); };
Where goog is Google Closure library class used is located in Math.Integer. Called functions :
goog.math.Integer.fromInt = function(value) { if (-128 <= value && value < 128) { var cachedObj = goog.math.Integer.IntCache_[value]; if (cachedObj) { return cachedObj; } } var obj = new goog.math.Integer([value | 0], value < 0 ? -1 : 0); if (-128 <= value && value < 128) { goog.math.Integer.IntCache_[value] = obj; } return obj; }; goog.math.Integer.fromBits = function(bits) { var high = bits[bits.length - 1]; return new goog.math.Integer(bits, high & (1 << 31) ? -1 : 0); };
That is not totally correct as return type should be return (s,p);
where
In order to fix this GMP wrapper should be created. This has been done in Haskell to JavaScript compiler project (source link).
ALLOC_PRIM_N (SIZEOF_StgArrWords + WDS(1), integer_cmm_int2Integerzh, val); p = Hp - SIZEOF_StgArrWords; SET_HDR(p, stg_ARR_WORDS_info, CCCS); StgArrWords_bytes(p) = SIZEOF_W;
Are as follows
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With