Various aspects of test strategies for FPGAs have been discussed here on SO but I can't find that the following question has been asked/discussed/answered:
At what levels should you simulate your FPGA design and what do you verify at each level?
If you answer using concepts such as x-level testing where x = block, subsystem, function, or something else, please describe what x is for you. Something like typical size, complexity, or an example.
Sep 14
Both given answers are the same when it comes to the actual question but I'll accept the answer from @kraigher since it's the shortest one.
Sep 10
This is a summary and a comparison of the two answers from @Paebbles and @kraigher. One of the answers is very long so hopefully this will help anyone that want to contribute with their own answer. Remember that there's a bounty at stake!
There is a difference in how much in-lab testing they do but it seems mostly related to the specific circumstances for the projects (how many things that can't be effectively tested with simulations). I happen to know a bit about @kraigher last project so I can say that both projects are in the 1+ year category. It would be interesting to hear a story from someone with a smaller project. From what I've seen far from all projects are as complete with respect to functional coverage in simulations so there must be other stories.
Sep 7
This is a number of follow-up questions to @peabbles too long to fit among the comments.
Yes @peabbles, you have provided much of what I was looking for but I still have extra questions. I'm afraid that this may be a lengthy discussion but given the amount of time we spend on verification and the various strategies people apply I think it deserves a lot of attention. Hopefully we will have some more answers such that various approaches can be compared. Your bounty will surely help.
I think your story contains many good and interesting solutions but I'm an engineer so I will focus on the pieces I think can be challenged ;-)
You've spend a lot of time testing on hardware to address all the external issues you had. From a practical point of view (since they were not going to fix their SATA standard violations) it's like having a flawed requirement spec such that you develop a design solving the wrong problem. This is typically discovered when you “deliver” which motivates why you should deliver frequently and discover the problems early as you did. I'm curious about one thing though. When you discovered a bug in the lab that needed a design change would you then update the testbenches at the lowest level where this could be tested? Not doing that increases the risk that the bug reappears in lab and over time it would also degrade the functional coverage of your testbenches making you more dependent on lab testing.
You said that most testing was done in the lab and that was caused by the amount of external problems you had to debug. Is your answer the same if you just look at your own internal code and bugs?
When you're working with long turnaround times like you did you find various ways to make use of that time. You described that you started to synthesize the next design when the first was being tested and if you found a bug in one drive you started to synthesize a fix for that one while continuing to test other drives with the current design. You also described problems with observability when testing in the lab. I'm going to do a number of sceptical interpretations of this, you have to provide the positive ones!
If you could synthesize the next design immediately when you started to test the first it seems like you were working with very small increments but still made the effort to run every test at every level all the way to hardware. This seems a bit overkill/expensive, especially when you're not fully automated on the hardware testing. Another sceptical interpretation is that you're looking for a bug but due to poor observability you are producting random trial and error type of builds hoping that they will give clues to the problem you're trying to isolate. Was this really effective use of time in the sense that every build added value or was it more “doing something is better than doing nothing”?
When designing the higher protocol layers did you consider to short circuit the communication stack on the higher levels to speed up the simulation? After all, the lower layers were already tested.
You reused some components and assumed them to be bug free. Was that because they were shipped with testbenches proving that? Proven in use tend to be weak since reuse often happens in another context. The Arianne 5 rocket is a spectacular example, your reuse of XAPP 870 for Virtex 5 another.
Since you have simulations at various levels I would assume that you value the faster run times at the lower levels and the shorter feedback loop you have when you can verify a piece of your design before the larger structure has been completed. Still you had pieces of code that were significant enough to be awarded their own components but still too simple to be awarded their own testbenches. Can you give an example of such a component? Were they really bug free? Personally I don't write many lines of code before I make a mistake so if I have a well packaged piece of code like a component I take the opportunity to test at that level for the reasons mentioned above.
Traditional FPGA verification The early FPGA design flow consisted of entering a gate-level schematic design, downloading it onto a device on a test board, and then validating the overall system with real test data.
Testbenches are pieces of code that are used during FPGA or ASIC simulation. Simulation is a critical step when designing your code! Simulation allows you the ability to look at your FPGA or ASIC design and ensure that it does what you expect it to. A testbench provides the stimulus that drives the simulation.
VHDL Test Bench Files are used with an EDA simulation tool to test the behavior of an HDL design entity. You can create a VHDL Test Bench File from a vector source file in the Quartus® Prime software by exporting the file as a VHDL Test Bench File with the Export command.
I'll try to explain my testing strategies by examples.
Introduction:
I developed a Serial-ATA Controller for my final bachelor project, which evolved to a very huge project in the months after my graduation. The testing requirements got harder and of cause every new bug or performance lack was harder to find, so I needed even more clever tools, strategies and solutions for debugging.
Development steps:
Phase 1: A ready to use IP Core example
I started on a Virtex-5 platform (ML505 board) and a Xilinx XAPP 870 with example code. Additionally, I got the SATA and ATA standards, the Xilinx user guides, as well as 2 test drives. After a short period, I noticed that the example code was mostly written for a Virtex-4 FPGA and that the CoreGenerator generated invalid code: unconnected signals, unassigned inputs, false configured values regarding the SATA specification.
Rule #1: Double check generated code lines, they may contain systematic faults.
Phase 2: Total rewrite of the transceiver code and design of a new physical layer
I developed a new transceiver and a physical layer to perform a basic SATA handshaking protocol. As I wrote my bachelor report, there was no good GTP_DUAL
transceiver simulation model and I had no time to write my own. So I tested everything on real hardware. The transceiver could be simulated, but the electrical IDLE conditions, needed for the OOB handshake protocol was not implemented or not working. After I finished my report, Xilinx updated the simulation model and I could simulated the handshake protocol, but unfortunately everything was running (see Phase 5).
How can one test a FPGA hard macro, without simulation?
Luckily, I had a Samsung Spinpoint HDD, which powered up only after valid handshaking sequences. So I had an acoustic response.
The FPGA design was equipped with big ChipScope ILAs which used 98% of BlockRAM, to monitor the transceiver behavior. It was the only possibility to guess whats was going on on the high speed serial wires. we had other difficulties which we could not solve:
Rule #2: If your design has space left, us it for ILAs to monitor the design.
Phase 3: A link layer
After some successful link ups with 2 HDDs I started to design the link layer.
This layer has big FSMs, FIFOs, scramblers, CRC generators and so on. Some components like FIFOs were given for my bachelor project, so I assumed these components are bug free. Otherwise I could start the provided simulations by my self and change parameters.
My own sub components were tested by simulation in testbenches. (=> component level tests). After that I wrote an upper layer testbench that could act as host or device, so I was able to build a 4 layer stack:
1. Testbench(Type=Host)
2. LinkLayer(Type=Host)
3. wires with delay
4. LinkLayer(Type=Device)
5. Testbench(Type=Device)
The SATA link layer transmits and receives data frames. So the normal process statement for stimuli generation was quite to much code and not maintainable. I developed a data structure in VHDL that stored testcases, frames and data word inclusive flow control information. (=> subsystem level simulation)
Rule #3: Building a counterpart design (e.g. the device) can help in simulations.
Phase 4: Test the link layer on real hardware
The host-side testbench layer form (3) was written to be synthesizable, too. So I plugged it together:
1. Testbench(Type=Host)
2. LinkLayer(Type=Host)
3. PhysicalLayer
4. TransceiverLayer
5. SATA cables
6. HDD
I stored a startup sequence as a list of SATA frames in the testbench's ROM and monitored with ChipScope the HDD responses.
Rule #4: Synthesizeable testbenches can be reused in hardware. The formerly generated ILAs could be reused, too.
Now was the point in time to test different HDDs and monitor their behavior. After a while of testing I could communicate with a handful of disks and SSDs. Some vendor specific workaround got added into the design (e.g. except double COM_INIT responses from WDC drives :) )
At this point synthesis needed circa 30-60 minutes to complete. This was caused by a midrange CPU, >90% FPGA utilisation (BlockRAM) and timing problems in the ChipScope cores. Some parts of the Virtex-5 design run at 300 MHz, so buffers git filled very fast. On the other hand a handshake sequence can take ob to 800 us (normally < 100 us), but there are devices on the market which sleep for 650 us until they response! So I looked into the field of storage qualification, cross-triggering and data compression.
While a synthesis was running, I tested my design with different devices and wrote a test result sheet for every device. If the synthesis was complete and Map/P&R were outstanding I restarted it with modified code. So I had several design in flight :).
Phase 5: Higher layers:
Next I designed the transport layer and the command layer. Each layer has a standalone testbench, as well as sub component testbenches for complex sub modules. (=> component and subsystem level tests)
All modules were plugged together in a multi-layer testbench. A designed a new data generator so I had not to handcode each frame, just the sequence of frames had to be written.
I also added a wire delay between the two LinkLayer instances, which was measured before in ChipScope. The checker testbench was the same from above, filled with expected frame orders and prepared response frames.
Rule #5: Some delays let you find protocol/handshake problems between FSMs.
Phase 6: Back to the FPGA
The stack was synthesized again. I changed my ILA strategy to one ILA per protocol layer. I generated the ILA via CoreGenerator, which allowed me to use a new ChipScope core type, the VIO (VirtualInputOutput). This VIO transfers simple I/O operations (button, switch, led) to and from the FPGA board via JTAG. So I could automated some of my testing processes. The VIO was also able to encode ASCII strings, so I decoded some error bits from the design into readable messages. This saved me from searching synthesis reports and VHDL codes. I switched all FSMs to gray encoding, to save BlockRAMs.
Rule #6: Readable error messages save time
**Phase 7: Advancements in ChipScope debugging
Each ILA of a layer had a trigger output put, which was connected to trigger inputs on other ILA. This enables cross-triggers. E.g. it's possible to use this complex condition: trigger in TransportLayer if a frame is aborted, after the LinkLayer has received the third EOF sequence.
Rule #7: Use multiple ILAs and connect their triggers cross-wise.
Complex triggers allow one to isolate the fault without time consuming resynthesis. I also started to extract FSM encodings from synthesis reports, so I could load that extracted data as token files into ChipScope and display FSM states with their real names.
Phase 8: A serious bug
Next, I was confronted with a serious bug. After 3 frames my FSMs got stuck, but I could not find the cause in ChipScope, because everything was OK. I could not add more signals, because a Virtex-5 has only 60 BlockRAMs ... Luckily I could dump all frame transactions from HDD startup until the fault in ChipScope. But, ChipScope could export data as *.vcd dumps.
I wrote a VHDL package to parse and import *.vcd dump files in iSim, so I could use the dumped data to simulate the complete Host <-> HDD interactions.
Rule #8: Dumped inter-layer transfer data can be used in simulation for a more detailed look.
Pause
Till then, the SATA stack was quite complete and passed all my tests. I got assigned to two other projects:
The first project reused the frame based testbenches and the per layer / protocol ILAs. The second project used a 8-bit CPU (PicoBlaze) to build an interactive test controller called SoFPGA. It can be remote controlled via standard terminals (Putty, Kitty, Minicom, ...).
At that time a college ported the SATA controller to the Stratix II and Stratix-IV platform. he had just to exchange the transceiver layer and to design some adapters.
SATA Part II:
The SATA controller should get an upgrade: (a) support 7-Series FPGAs and 6.0 Gb/s transfer speed. The new platform was a Kintex-7 (KC705).
Phase 9:
Testing such big designs with buttons and LEDs is not doable. A first approch was the VIO core from phase 6. So I choose to include the previous developed SoFPGA. I added a I²C controller, which was needed to reprogramm an on-board clock generator from 156.25 to 150 MHz. I also implemented measurement modules to measure transfer rate, elapsed time and so on. Error bits from the controller got connected to the interrupt pin of the SoFPGA and error were displayed on a Putty screen. I also added SoFPGA controllable components for fault injection. For example, its possible to insert bit errors into SATA primitives but not into data words.
With this technique, we could prove a protocol implementation faults in several SATA devices (HDDs and SSDs). Its possible to cause a deadlock in the linklayer FSM of some devices. This is caused by an missing edge in the LinkLayer FSM transition diagram :)
With the SoFPGA approch, it was easy to modify tests, reset the design, report errors, and to even benchmarks.
Rule #9: The usage of a soft core allows you to write tests/benchmarks in software. Detailed error reporting can be done via terminal messages. New test programs can be uploaded via JTAG -> no synthesis needed.
Phase 0: back to the beginning:
My reset network was very very bad. So I redesigned the reset network with the help of two colleges. The new clock network has separate resets for the clock wires and MMCMs as well as stable signals to indicate proper clock signals and frequencies. This is needed because the external input clock is reprogrammed at runtime, SATA generation changes can cause clock divider switching at runtime and reset sequences in the transceiver can cause unstable clock outputs from the transceiver. Additionally, we implemented a powerdown signal, to start from zero. So if out SoFPGA triggers a powerdown/powerup sequence, the SATA controller as new as after programming the FPGA. This safes masses of time!
Rule #0: Implement proper resets to every test behaves in the same way. No reprogramming of the FPGA is needed. Add cross clock circuits! This prevents many random faults.
Notes:
Some sub components from the SATA controller are published in our PoC-Library. There are also testbench packages and scripts to ease testing. The SoFPGA core is published, too. My PicoBlaze-Library project eases the SoC development.
Is it fair to say the your levels of testing are component level (simulation of CRC, complex FSM), subsystem level (simulation of one of your layers), top-level simulation, lab testing w/o SW, lab testing with SW (using SoFPGA)?
Yes, I used component testing for midsize components.Some of them were already ready-to-use to I trusted the developers. Small components were tested in the subsystem level test. I believed in my code, so there was no separate testbench. If one should have an fault, I'll see it in the bigger testbench.
When I started part II of the development, I used top-level testbenches. One the one hand there was a simulation model available, but very slow (it took hours for a simple frame transfer). On the other hand our controller is full of ILAs and the Kintex-7 offers several hundred BlockRAMs. Synthesis takes circa 17 minutes (incl. 10 ILAs, and one SoFPGA). So in this project lab testing is faster than simulation. Many improvements (token files, SoFPGA, cross ILA triggering) eased the debugging process significantly.
Can you give a ballpark figure of how your verification efforts (developing and running tests and debugging on that level) were distributed among your levels?
I think this is hard to tell. I worked 2 years on SATA and one year on IPv6 / SoFPGA. I think most (>60%) time was spend in "external debugging". For example:
With a similar distribution, where do you detect your bugs and where do you isolate the root cause? What I mean is that what you detect in the lab may need simulations to isolate.
So as mentioned before, most testing was lab testing, Espei
What is the typical turnaround time at the different levels? What I mean is the time it takes from when you decide to try something out until you've completed a new test run and have new data to analyze.
Because synthesis takes so long, we used pipelined testing. So while we tested one design on the FPGA, a new one was already synthesizing. Or while one error got fixed and synthesized, we tested the design with other disks (7) and SSDs (2). We created matrices which disk failed and which not.
Most debug solution got invented with a forward look: reuseablility, parameterability, ...
Last paragraph:
It was very hard work to get the Kintex-7 ready for SATA. Several questions were posted, e.g. Configuring a 7-Series GTXE2 transceiver for Serial-ATA (Gen1/2/3). But we could not find a proper configuration for the GTXE2
transceiver. So with the help of our embedded SoFPGA, we developed a PicoBlaze to DRP adapter. Dynamic Reconfiguration Port (DRP) is the interface from FPGA fabric into the transceiver configuration bits. On the one hand we monitored the frequency sliding unit in the transceiver, while adapting to the serial line. On the other hand we reconfigured the transceiver at runtime via SoFPGA, controlled from a putty terminal. We tested > 100 configurations in 4 hours with only 3 synthesis runs. Synthesizing each configuration had cost us weeks...
When you discovered a bug in the lab that needed a design change would you then update the testbenches at the lowest level where this could be tested?
Yes, we update the testbenches to reflect the changed implementation, so we hopefully did not run into the same pitfall again.
You said that most testing was done in the lab and that was caused by the amount of external problems you had to debug. Is your answer the same if you just look at your own internal code and bugs?
I designed the state machines with same safety. For example there is always an others or else case. So if one of the developers (now we are a group of four) adds new states and misses edges or so, these transitions get caught. Each FSM has at least one error state, which is entered by transition faults or sub components reporting errors. One error code is generated per layer. The error condition bubbles to the top most FSM. Depending on the error severity (recoverable, not recoverable, ..) an upper FSM performs recovering procedures or halts. The state of all FSMs plus there error condition is monitor by ChipScope. So in most cases it's possible to discover failures in less than a minute. The tuple of (FSM State; Error Code) mostly identifies the cause very exact, so I can name a module and code line.
We also spend many hours in designing a layer / FSM interaction protocol. We named this protocol/interface Command-Status-Error. An upper layer can monitor a lower layer via Status
. If Status = STATUS_ERROR
, then Error
is valid. An upper layer can control a lower layer by Command
.
It's maybe not very resource efficient (LUTs, Regs), but it's very efficient for debugging (time, error localisation).
[...] I'm going to do a number of skeptical interpretations of this, you have to provide the positive ones!
Developing SATA was piecewise a very depressing task. Especially the parameter search for the transceiver :). But we also head good moments:
If you could synthesize the next design immediately when you started to test the first it seems like you were working with very small increments but still made the effort to run every test at every level all the way to hardware. This seems a bit overkill/expensive, especially when you're not fully automated on the hardware testing.
We did not run simulations every time. Just after major changes in the design. While we tested one feature we started to test a new one. It's a bit like wafer production. Current chips host already circuits of the next or after next generation for testing. => pipelining :) One drawback is if an major error occurs, the pipeline must be cleared and each feature must be tested individually. This case was very rare.
The development process was always the question: Can I find the bug / solution in the next 5 days with my current set of tools or should I invest 1-2 weeks designing a better tool with better observability?
So we focused on automation and scripting to reduce human errors. To explain it in detail would burst this answer :). But for example our SoFPGA exports ChipScope token files directly from VHDL. It also updates assembly file at each synthesis run. So if one changes the SoFPGA design all *.psm files are updated (e.g. device addresses).
Another sceptical interpretation is that you're looking for a bug but due to poor observability you are producting random trial and error type of builds hoping that they will give clues to the problem you're trying to isolate. Was this really effective use of time in the sense that every build added value or was it more “doing something is better than doing nothing”?
We got no help from Xilinx regarding the correct GTXE2
settings. The internal design was also mostly unknown. So at some point it was trial-and-error. So the only way was to narrow down the search space.
When designing the higher protocol layers did you consider to short circuit the communication stack on the higher levels to speed up the simulation? After all, the lower layers were already tested.
Yes, after the link layer was done, we spared all lower layers (physical transceiver) to speed up simulation. Just the wire delay was left.
You reused some components and assumed them to be bug free. Was that because they were shipped with testbenches proving that?
The reused testbench component was written as normal synthesizable code. So after it's testing it worked well in simulation and on the device. The testbench component was tested by a testbench itself.
Still you had pieces of code that were significant enough to be awarded their own components but still too simple to be awarded their own testbenches. Can you give an example of such a component?
For example the Primitive_Mux
and Primitive_Detector
. SATA inserts data words, CRC values or primtives into the 32-bit datastream. The Primitive_Mux is a simple multiplexer, but to big to be inlined in an other component => readability, encapsulation.
Personally I don't write many lines of code before I make a mistake so if I have a well packaged piece of code like a component I take the opportunity to test at that level for the reasons mentioned above.
I think no one write bug free code, but I think my MTBF as increased over the years ;).
Here is an example on how to take a sledgehammer to crack a nut:
This picture captured by ChipScope connected to the RX Phase Interpolator, shows how the GTXE2 adjusts its clock phase to keep the bit alignment. We needed circa one week to implement it, so we could see what's going on in the GTXE2. We needed another week to implement a DRP adapter, because the mux selecting the values shown in the graphs, could only be changed by dynamic reprogramming! Luckily we had our SoFPGA :)
The mostly linear graphs show FPGA and HDD running at almost the same speed, with low jitter and a constant drift, because 2 oscillators are used. The Samsung 840 Pro shows us Spread Spectrum Clocking (SSC) behavior. The SATA line frequency of e.g. 6 GHz can be decreased by up to 0.5 percent using a triangle modulation. The valleys shows us that the internal filter coefficients can't cope with SSC. It's incapable of handling the turning points on the triangle modulation. => So now we needed to find new filter parameters.
I perform behavioral simulation at all levels. That is all entities should have one corresponding test bench aiming for full functional coverage. If specific details of entities A, B and C have already been tested in isolation in their corresponding test benches they do not have to be covered in the test bench for entity D which instantiates A, B and C which should focus on proving the integration.
I also have device or board level tests where the actual design is verified on the actual device or board. This is because you cannot trust a device level simulation when models start to become inexact and also it takes to long. In the real device hours of testing can be achieved instead of milliseconds.
I try to avoid to perform any post-synthesis simulation unless a failure occurs in the device level tests in which case I perform it to find bugs in the synthesis tool. In this case I can make a small wrapper of the post-synthesis netlist and re-use the testbench from the behavioral simulation.
I work very hard to avoid any form of manual testing and instead rely on test automation frameworks for both simulation and device level testing such that testing can be performed continuously.
To automate simulation I use the VUnit test automation framework which @lasplund and myself are the authors of.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With