Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the general procedure for compiling an HDL Program for an FPGA?

I have a question regarding the compilation of HDL programs within the context of FPGA design.

1) Why does the compilation process take so long? Is it really the compilation process that takes a long time, or is it the writing of individual logic gates that take a long time?

2) Why are the compiled files generally referred to as 'bitfiles'? What are the format of these bitfiles? I'm picturing a 2 dimensional matrix of gates that will either be opened or closed depending on the bits in the bitfile.

Thanks for any help!

like image 200
Izzo Avatar asked May 04 '16 23:05

Izzo


People also ask

What is FPGA in HDL?

Devices called field-programmable gate arrays (FPGAs), whose physical attributes can be manipulated through the use of hardware description languages (HDLs), bridge the gap between programming software and programming hardware.

Is VHDL compiled?

Synthesis is a process where a VHDL is compiled and mapped into an implementation technology such as an FPGA or an ASIC.

What are are basic differences between VHDL and Verilog syntax?

The main difference between Verilog and VHDL is that Verilog is based on C language while VHDL is based on Ada and Pascal languages. Both Verilog and VHDL are Hardware Description Languages (HDL). These languages help to describe hardware of digital system such as microprocessors, and flip-flops.

What is Verilog code?

Verilog, standardized as IEEE 1364, is a hardware description language (HDL) used to model electronic systems. It is most commonly used in the design and verification of digital circuits at the register-transfer level of abstraction.


1 Answers

1) Why does the compilation process take so long? Is it really the compilation process that takes a long time, or is it the writing of individual logic gates that take a long time?

To begin, if you want to see all the toil and hard work your FPGA tools do, just turn on verbose mode/detailed reports, and skim/read them.

I'm going to answer with a Xilinx viewpoint, since that's what I know. Although the processes may have different names/groupings/ordering, the idea is the same across vendors.

The HDL->bytecode process differs slightly from how one would compile, say, Java. It's not just conversion of each line to some bytecode, but an involved process in which the entire design is converted to a hardware implementation. You're not converting a program to hardware, but a description of hardware to hardware. You only call a pile of Verilog or VHDL a program when it's running a testbench in a simulator.

Remember that timing constraints are a thing, and thus optimization for timing/depth of logic is a top priority.

In practice, synthesis encompasses conversion of behavioral Verilog/VHDL to RTL representation, including FSM synthesis, extraction of boolean functions, optimization, decoders/encoders, muxes, ROMs, etc. Additionally, the synth step will duplicate registers whose values are needed in multiple areas on the FPGA, so that the routing delays to those areas are minimized. Some synth tools, such as XST, will provide a rough estimate of timing and device utilization at this stage.

Additionally, remember that synthesis involves some level of inference. HDL code that matches certain motifs/patterns will be converted to hardware macros or instantiations of certain primitives. If I write code that accesses a large reg[7:0] foo [2047:0] synchronously based on an address (and possibly a write enable) then the synth tool will want to detect that and put a block RAM in place. It will also try to optimize un-needed logic and may do fairly in-depth logical analysis in that optimization.

Translation/mapping involves tons of hardware logic intricacies as well--at this stage the software will try to stuff your logic functions into lookup tables in optimal ways, fit those into slices alongside the flipflops that they may drive, and optimize again. At this step, redundant or superfluous components left over from optimization may be removed.

Placing and routing is by far one of the more intensive steps in some designs. Now that mapping gave a sea of lookup tables and registers connected by a slew of wires, they all need to be placed using limited interconnect resources. The limitations include number of lines in a row/column, what bits can connect to other bits at certain distances, as well as clock distribution. Remember again that timing constraints exist. PAR may be able to place a design quickly, but spend a very long time trying to tweak the placement to fit those constraints. Placing and routing isn't an easy-to-solve problem, and involves tons of brute-force, random placement based on cost tables, and other unique approaches. Needless to say, this can take a long time.

Imagine trying to organize the below-shown circuit with no more than two crossings per wire and no more than 25cm of wire in the timing-critical path, just on the scale of an FPGA:

enter image description here source

2) Why are the compiled files generally referred to as 'bitfiles'? What are the format of these bitfiles? I'm picturing a 2 dimensional matrix of gates that will either be opened or closed depending on the bits in the bitfile.

You're pretty close, though not exactly. The bitstream configures the following parameters:

  • Routing. What signals go where, over what wires. This typically sets multiplexers and cross-connections. Pretty spot-on to what you mention, though they're really not gates more than connections (although fully buffered to avoid capacitance effects)

  • Slices. Each slice contains a few lookup tables used for function generators, as well as more multiplexers and such. The bitstream also specifies the contents of the lookup tables, whether they should be bypassed or linked, whether the output should go straight to routing or to a flip-flop, whether that flipflop should have an async reset, whether it should be posedge or negedge, and so on. For distributed memory slices, configuration related to writing/shifting the LUT under external control.

  • Other function blocks: How DSP/multiplier tiles should be configured, parameters/connectivity for clock-handling circuitry such as DCMs/PLLs/MMCMs/etc, widths/fallthrough/initial contents of block RAMs, the parameters for transcievers, et cetera.

  • Metadata. Possibly prevent reading back the bitstream over the configuration port/JTAG, if it should not be copied.

like image 114
nanofarad Avatar answered Nov 12 '22 10:11

nanofarad