Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to prevent system hang before watchdog timer task kicks in

We are using an ARM AM1808 based Embedded System with an rtos and a File System. We are using C language. We have a watchdog timer implemented inside the Application code. So, whenever something goes wrong in the Application code, the watchdog timer takes care of the system.

However, we are experiencing an issue where the system hangs before the watchdog timer task starts. The system hangs because the File System code is badly coded with so many number of while loops. And sometimes due to a bad NAND(or atleast the File System code thinks it is bad) the code hangs in a while loop and never gets out of it. And what we get is a dead board.

So, the point of giving all the information is to ask you guys whether there is any mechanism which could be implemented in the code that runs before the application code? Is there any hardware watchdog? What steps can be taken in order to make sure we don't get a dead board caused by some while loop.

like image 650
user9128860 Avatar asked Dec 21 '17 21:12

user9128860


2 Answers

Professional embedded systems are designed like this:

  • Pick a MCU with power-on-reset interrupt and on-chip watchdog. This is standard on all modern MCUs.
  • Implement the below steps from inside the reset interrupt vector.
  • If the MCU memory is simple to setup, such as just setting the stack pointer, then do so the first thing you do out of reset. This enables C programming. You can usually write the reset ISR in C as long as you don't declare any variables - disassemble to make sure that it doesn't touch any RAM memory addresses until those are available.
  • If the memory setup is complex - there is a MMU setup or similar - C code will have to wait and you'll have to stick to assembler to prevent accidental stacking caused by C code.
  • Setup the most fundamental registers, such as mode/peripheral routing registers, watchdog and system clock.
  • Setup the low-voltage detect hardware, if applicable. Hopefully the out-of-reset state for LVD on the MCU is a sound one.
  • Application-specific, critical registers such as GPIO direction and internal pull resistor registers should be set from here. Many MCU have pins as inputs by default, making them vulnerable. If they are not meant to be inputs in the application, the time they are kept as such out of reset should be minimized, to avoid problems with noise, transients and ESD.
  • Setup the MMU, if applicable.
  • Everything else "CRT", such as initialization of .data and .bss.
  • Call main().

Please note that pre-made startup code for your MCU is not necessarily made by professionals! It is fairly common that there's an amateur-level "CRT" delivered with your toolchain, which fails to setup the watchdog and clock early on. This is of course unacceptable since:

  1. This makes any program running on that platform a notable safety/poor quality hazard, in case the "CRT" will crash/hang for whatever reason.
  2. This makes the initialization of .data and .bss needlessly, painfully slow, as it is then typically executed with the clock running on the default on-chip RC oscillator or similar.

Please note that even industry de facto startup code such as ARM CMSIS fails to do some of the MCU-specific hardware setups mentioned above. This may or may not be a problem.

like image 181
Lundin Avatar answered Nov 03 '22 18:11

Lundin


There is a hardware watchdog that could be run before the application runs. ARM AM1808 does have a timer that could be implemented as a watchdog, as per documentation: www.ti.com/lit/ds/symlink/am1808.pdf. So, you may wish to set it like that at least during the part of the program that runs through the critical and long section. You at wish to have a piece of booting code that first sets this watchdog, and after the correct initialization, goes to application. In fact, this is a very common approach.

like image 44
VladP Avatar answered Nov 03 '22 19:11

VladP