I'm curious to hear people's opinions on how hard it would be to implement a compiler on an FPGA. This could just be a compiler backend, LLVM for example, and the implementation would just take in LLVM IR and output machine code.
The purpose of this would be to allow - so to speak - real-time execution of source code (or intermediate representation code), in the sense that you:
For a given system, a more or less static part of the FPGA could be the LLVM backend, ie. the part that decides what type of machine code to output, for example x86-64 with SSE4. Or ARM Thumb-2 with NEON and VFP instructions. Unless you have a system with multiple CPUs, this would remain the same. This shouldn't be completely static though, and thus not implemented in hardware, because optimizations to the compiler are made constantly, and it would need to be updated from time to time. The more frequently changing part of the FPGA would be the front-end, the part that produces the LLVM IR from a given language: C, C++, Vala etc.
The neat thing about this system would be that the code is always optimized to the CPU in the system at hand. In the current situation, few builds take advantage of all the extra functionality in CPUs: SSE, AVX, 3DNow!, Neon, VFP. Using this (completely hypothetical) approach, the full potential of CPUs could be utilized by compiling for a specific architecture in real-time, and executing the produced instructions immediately after. This would be especially useful on ARM-based systems where we need all the juice we can squeeze out of the CPUs, and the CPU in itself is very slow at doing the compilation.
I know gcc can be set up to use threads, and, I'd assume parallelizing a compiler would be relatively easy. Ie., just compiling all the source files in parallel.
We could also ditch the front-end - the programming language-specific part of the compiler - and just distribute programs as intermediate representation code like LLVM IR.
Is this in any way feasible?
I wouldn't bother. I'd configure the FPGA as a LLVM IR VM and just run the code, delegating controlling the hardware to the CPU.
Certain parts of compilation are very easily parallelized in a non-threaded way. For example, string-keyed dictionaries are very common, so a content addressable memory could provide a significant optimization.
FPGAs are going to fare very poorly for certain aspects of compilation though. Overload resolution, for example, has to consider argument-dependent lookup, user-defined conversions, templates, etc.
You'll get the best performance by pipelining and using the resources of both the FPGA and CPU. For example, let the FPGA lex the source code and create a token stream with all identifiers replaced by symbol table indices, while the CPU runs later compilation steps (inlining and loop optimizations, for example).
Of course you already pointed out that this doesn't help much with per-machine optimizations, if the code can be pre-processed and distributed in a p-code format. Might make a nice compile-accelerator during development though.
I also had the same idea some time ago.
Implementing such a complex program on FPGA is possible, given the adequate synthesis technology. Using behavioral synthesis (aka C to HDL synthis) make it feasible.
The funny thing about this, is that if the output of your compiler is also an HDL, then one can imagine to bootstrap the behavioral synthesizer (ie making it synthesize itself), which is generally an important validation step for a compiler.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With