Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

haskell implementation questions

Haskell is a really great language. I love it. But as a C++ programmer and with some basic knowledge about the computer architecture, I really want to know the implementation details about Haskell.

I mean, for example, map function. I know the grammar, the result. However, I want to know how does this function really work in the RAM or so. Because C family language is very clear about the mapping between the grammar and the computer behaviors.

So does anybody have ideas about the computer behaviors behind functional programmings grammar? Or any books about this like “inside the C++ object model”?

like image 795
user3129535 Avatar asked Dec 11 '22 08:12

user3129535


1 Answers

First, a warning: One of the fundamental properties of Haskell is that the compiler is highly likely to perform very radical transformations of your code during compilation. So the code that actually executes may not resemble what you wrote very closely at all.

That caveat asside, to a first approximation you can expect every variable and every data field to be a pointer to a heap object. Some of these heap objects represent data (integers, bools, characters, list nodes, etc.), and some of them represent Haskell code that hasn't executed yet due to laziness. If you write a long, complex expression, each and every sub-expression becomes a heap object, with the top-level expressions pointing to the lower ones. So the entire expression graph of your program becomes an object graph on the heap.

(Graph /= tree. A tree has no "loops" in it. Haskell allows recursion, so Haskell expressions aren't necessarily expression trees.)

Big Haskell expressions become a whole bunch of heap objects. Complicated nested pattern matches get desugared into a series of single-layer ones in an optimal order. All the other syntax sugar of the language gets stripped away, until there are only 6 primitives left:

  1. Create variable
  2. Use variable
  3. Create function
  4. Call function
  5. Create data record
  6. Inspect data record

If you want to know how the actual bits and bytes work, that's down to the compiler. If you mean GHC, it works approximately like this:

  • Every heap object is either code or data.

  • If it's data, it contains a value constructor ID number and one pointer for each of that constructor's fields.

  • If it's code (i.e., some subexpression that hasn't executed yet), it contains one a pointer to the function's machine code block, and one pointer for each of the function's arguments (no currying).

  • When you try to do a conditional branch, I/O operation or seq on a heap object, the run-time jumps to some code pointed to by the heap object.

  • If the object represents data, it will point to some code that instantly returns.

  • If the object represents code, it will point to the code that actually implements the function. This code computes the function's answer, and overwrites the original heap object with a data object representing the answer.

  • Either way, when control returns to the caller, the heap object will definitely be a data object, and it can now inspect the contructor ID or whatever else it wanted to do.

like image 181
MathematicalOrchid Avatar answered Jan 04 '23 13:01

MathematicalOrchid