Suppose you're compiling a functional language to portable C, and suppose also that for various reasons you want precise rather than conservative garbage collection. There is no portable way (perhaps no way at all in the general case) for the garbage collector to figure out what is and isn't a pointer on the C stack. It seems to me there are two solutions to this problem: <ol> <li>Shadow stack. Make each C function maintain bookkeeping information about what is and isn't a pointer. This is the approach recommended by e.g. LLVM.</li> <li>Take advantage of the fact that you are compiling a functional language, which means mainline code has no side effects. When the allocator detects out of memory, instead of calling the garbage collector itself, it aborts the current operation with a longjmp back to the main loop, which calls the garbage collector (in a context where the set of variables that may contain pointers is known in advance) then restarts the operation.</li> </ol> It seems to me that, if you are dealing with a pure functional language where the second approach is applicable, it must be more efficient than the first approach, as well as easier to mix with handwritten C. Are there any problems I'm overlooking? Any references to existing discussion or implementations of this technique?

It is possible to design a pure FP language using a single data structure: <pre class="prettyprint"><code>typedef enum record_type { RT_SYMBOL, RT_NUMBER, RT_PAIR }; struct record { record_type type; void *value; }; </code></pre> Programs and data can be represented using <code>pairs</code> of <code>records</code>: <pre class="prettyprint"><code>struct pair { record *car; record *cdr; }; </code></pre> Here is how a simple expression - <code>2 * 3</code> - could be represented using <code>records</code>: <pre class="prettyprint"><code>record r1; r1.type = RT_NUMBER; r1.value = &two; record r2; r1.type = RT_NUMBER; r1.value = &three; record opr1; opr1.type = RT_NUMBER; opr1.value = &OP_MULT; /* A machine op-code for multiplication. */ pair p_oprnds; p_oprnds.car = &r1; p_oprnds.cdr = &r2; pair p; p.car = opr1; p.cdr = p_oprnds; </code></pre> This is the same as the Lisp expression: <code>(* 2 3)</code>. Now you can define a machine that operates on <code>pairs</code>, treating the <code>car</code> as an operator and the <code>cdr</code> as operands. As we deal with only one data structure, precise GC is possible. See Lispkit Lisp for the architecture of such a VM. Also read Lisp in Small Pieces before starting off with a serious attempt on writing an FP -> C compiler.

I think the most obvious thing you haven't described is how to handle persistent out-of-memory. As you've described it, suppose I have an operation that uses more memory than is available. Eventually I reach: <ol> <li>Start whatever stage of the operation takes us over the limit.</li> <li>Run for a while.</li> <li>Run out of memory.</li> <li>Free all the memory allocated by this stage (there is nothing else to free).</li> <li>Go to 1.</li> </ol> That is, an infinite loop. So at the least you want some sort of generational garbage collection that will allow you to detect the loop and exit.

Compiling functional languages to C

Suppose you're compiling a functional language to portable C, and suppose also that for various reasons you want precise rather than conservative garbage collection. There is no portable way (perhaps no way at all in the general case) for the garbage collector to figure out what is and isn't a pointer on the C stack. It seems to me there are two solutions to this problem:

Shadow stack. Make each C function maintain bookkeeping information about what is and isn't a pointer. This is the approach recommended by e.g. LLVM.
Take advantage of the fact that you are compiling a functional language, which means mainline code has no side effects. When the allocator detects out of memory, instead of calling the garbage collector itself, it aborts the current operation with a longjmp back to the main loop, which calls the garbage collector (in a context where the set of variables that may contain pointers is known in advance) then restarts the operation.

It seems to me that, if you are dealing with a pure functional language where the second approach is applicable, it must be more efficient than the first approach, as well as easier to mix with handwritten C.

Are there any problems I'm overlooking? Any references to existing discussion or implementations of this technique?

What languages can compile to C?

The language Haxe can output to C++, C#, Java, JavaScript, Python(experimental), PHP, Flash and NekoVM. Show activity on this post. Vala and Genie are languages that use the GObject type system and compile to C code. I've never used them but they look interesting.

Can functional programming be done in C?

Obviously, C is a procedural language and doesn't really support functional programming natively.

Do all languages compile to C?

Having said all that, the answer to your question is "No". C is based off of a language called ALGOL, and there were many competitors both with ALGOL (FORTRAN, Lisp, COBOL) and C (none come to mind).

What is functional language in C?

C language is procedural. While it can be used to apply it as a functional programming doesn't make it one. When it was developed, the concept of functional programming did not exist.

It is possible to design a pure FP language using a single data structure:

typedef enum record_type { RT_SYMBOL, RT_NUMBER, RT_PAIR };

struct record
{
  record_type type;
  void *value;  
};

Programs and data can be represented using pairs of records:

struct pair
{
  record *car;
  record *cdr;
};

Here is how a simple expression - 2 * 3 - could be represented using records:

record r1;
r1.type = RT_NUMBER;
r1.value = &two; 

record r2;
r1.type = RT_NUMBER;
r1.value = &three; 

record opr1;
opr1.type = RT_NUMBER;
opr1.value = &OP_MULT; /* A machine op-code for multiplication. */

pair p_oprnds;
p_oprnds.car = &r1;
p_oprnds.cdr = &r2;

pair p;
p.car = opr1;
p.cdr = p_oprnds;

This is the same as the Lisp expression: (* 2 3). Now you can define a machine that operates on pairs, treating the car as an operator and the cdr as operands. As we deal with only one data structure, precise GC is possible. See Lispkit Lisp for the architecture of such a VM.

Also read Lisp in Small Pieces before starting off with a serious attempt on writing an FP -> C compiler.

I think the most obvious thing you haven't described is how to handle persistent out-of-memory. As you've described it, suppose I have an operation that uses more memory than is available. Eventually I reach:

Start whatever stage of the operation takes us over the limit.
Run for a while.
Run out of memory.
Free all the memory allocated by this stage (there is nothing else to free).
Go to 1.

That is, an infinite loop. So at the least you want some sort of generational garbage collection that will allow you to detect the loop and exit.

Compiling functional languages to C

Tags:

c

functional-programming

garbage-collection

compiler-construction

language-implementation

rwallace

People also ask

2 Answers

Vijay Mathew

Steve Jessop

Recent Activity

Donate For Us

Compiling functional languages to C

Tags:

c

functional-programming

garbage-collection

compiler-construction

language-implementation

rwallace

People also ask

2 Answers

Vijay Mathew

Steve Jessop

Related questions

Recent Activity

Donate For Us