I'm working on an existing driver that controls an 8-bit MCU over a serial port. There are many different flavors of firmware for the MCU but they all share a common method of trying to ensure link integrity. This method is not very robust and I am looking for ideas about how the driver could modify its behavior to get the most out of it. Commands are gcode with a line number and a checksum: <pre class="prettyprint"><code>N3 T0*57 N4 G92 E0*67 N5 G28*22 N6 G1 F1500.0*82 N7 G1 X2.0 Y2.0 F3000.0*85 N8 G1 X3.0 Y3.0*33 </code></pre> The line number must be sequential (but can be reset with <code>M110</code>). If the checksum doesn't match or the line number is out of sequence the firmware will reply <code>Resend: nnn</code> where <code>nnn</code> is the last successful <code>N</code> plus 1. The "checksum" is extremely primitive: <pre class="prettyprint"><code> // Calc checksum. byte checksum = 0; byte count = 0; while(instruction[count] != '*') checksum = checksum^instruction[count++]; </code></pre> The main problem is that the primary error mechanism is dropped bytes due to interrupts being held off leading to overflows in the 1-byte MCU FIFO. The actual serial bus is a few cm between an FTDI (or similar) USB serial bridge and the MCU, so bit errors are unlikely. I've never observed a bit error in a reply from the firmware. As you can see, the algorithm above would detect one dropped byte, but if you dropped two of the same byte (anywhere!) the result would still match. Thus <code>F3000.0</code> (feedrate 3000mm/min) could be transmuted to <code>F30.0</code> and still match. Plus, the character set is very small so certain bits are never going to be involved. Is there anything the driver can do to make a given line more robust? <ul> <li>Add or drop trailing (or even leading) zeroes</li> <li>Add or drop spaces</li> <li>Re-order words (<code>X1 Y1</code> is the same as <code>Y1 X1</code>)</li> <li>Add or drop spaces</li> <li>Make "insignificant" modifications to values within some tolerance (eg <code>F2999.9</code> instead of <code>F3000</code>)</li> <li>Reset the line number to get some specific <code>N</code> for a given line</li> <li>Break a single command into multiple equivalent commands (eg <code>G1 X2</code> becomes <code>G1 X1</code> <code>G1 X2</code> assuming X=0 initially)</li> <li>Eliminate (or add) superfluous words (eg <code>T0</code> is meaningless for most commands, and if you send <code>F3000</code> once it is implied in the future so it can be optionally sent or not)</li> </ul> If I believe that the firmware drops bytes in groups then the most important thing could be to avoid back-to-back duplicates like <code>00</code> which would (if dropped together) be invisible.

If you can't change the firmware, your options are quite limited for what you can do on the PC to improve the robustness of the link. Just a few possibilities: <ul> <li>Lower the data rate if the firmware allows for that (reduce chance of dropped bytes).</li> <li>Try to construct packets that don't contain zero or duplicate bytes, if the protocol or functionality has any flexibility to allow for that.</li> <li>Send messages several times (this makes it more likely the device will receive a good message; this doesn't help to reduce "false positives" however).</li> </ul> If you can change the firmware, then you have much greater potential for improvement: implement a proper CRC (even an 8-bit CRC would be a significant improvement, but 16 bits would be better). You would be best to implement auto-negotiation in the PC driver so it can talk both the "old" and "new" protocols, and figure out which type of device it's talking to.

Making the best of a bad "checksum" algorithm

Tags:

embedded

checksum

driver

serial-port

I'm working on an existing driver that controls an 8-bit MCU over a serial port. There are many different flavors of firmware for the MCU but they all share a common method of trying to ensure link integrity. This method is not very robust and I am looking for ideas about how the driver could modify its behavior to get the most out of it.

Commands are gcode with a line number and a checksum:

N3 T0*57
N4 G92 E0*67
N5 G28*22
N6 G1 F1500.0*82
N7 G1 X2.0 Y2.0 F3000.0*85
N8 G1 X3.0 Y3.0*33

The line number must be sequential (but can be reset with M110). If the checksum doesn't match or the line number is out of sequence the firmware will reply Resend: nnn where nnn is the last successful N plus 1. The "checksum" is extremely primitive:

        // Calc checksum.
        byte checksum = 0;
        byte count = 0;
        while(instruction[count] != '*')
                checksum = checksum^instruction[count++];

The main problem is that the primary error mechanism is dropped bytes due to interrupts being held off leading to overflows in the 1-byte MCU FIFO. The actual serial bus is a few cm between an FTDI (or similar) USB serial bridge and the MCU, so bit errors are unlikely. I've never observed a bit error in a reply from the firmware.

As you can see, the algorithm above would detect one dropped byte, but if you dropped two of the same byte (anywhere!) the result would still match. Thus F3000.0 (feedrate 3000mm/min) could be transmuted to F30.0 and still match. Plus, the character set is very small so certain bits are never going to be involved.

Is there anything the driver can do to make a given line more robust?

Add or drop trailing (or even leading) zeroes
Add or drop spaces
Re-order words (X1 Y1 is the same as Y1 X1)
Add or drop spaces
Make "insignificant" modifications to values within some tolerance (eg F2999.9 instead of F3000)
Reset the line number to get some specific N for a given line
Break a single command into multiple equivalent commands (eg G1 X2 becomes G1 X1 G1 X2 assuming X=0 initially)
Eliminate (or add) superfluous words (eg T0 is meaningless for most commands, and if you send F3000 once it is implied in the future so it can be optionally sent or not)

If I believe that the firmware drops bytes in groups then the most important thing could be to avoid back-to-back duplicates like 00 which would (if dropped together) be invisible.

475

asked Apr 05 '11 18:04

Ben Jackson

3 Answers

One thing you could try is to configure the UART on the host to send 2 stop bits instead of 1 (which is what you're probably currently using). The MCU receiver will notice nothing except that there's an extra bit-time of idle between characters. That's about 10% more time to get the character out of the receive register before the next character is shifted in.

Typically, a UART doesn't look for more than 1 stop bit on receive data, even if the UART is configured for 2 stop bits (there's no reason to enforce extra stop bits on receive), so the fact that the MCU will still send only a single stop bit shouldn't cause any problems on receiving responses from the device.

If you're at a high data rate this doesn't add much time, so it probably won't help (but that depends on what's the underlying cause of the overruns is). If my ciphering is right, it works out to another 25 microseconds for the MCU to avoid an overrun if you're running the link at 38400 bps.

It's a long shot, but it's a cheap change that should require no modifications other than the serial port configuration on the host side.

answered Sep 24 '22 23:09

Michael Burr

We may not all be familiar with G-Code, a link is always helpful for domain specific technology.

I would say that the simple checksum is probably adequately suited to the data length, format, and the processor performance. If you are already dropping characters, you hardly want to add more CPU load with a CRC do you?

You have multiple lines of defence in this protocol. The data has to be well formed, in sequence, and pass a checksum, it also appears to have a fairly limited valid character set. So checking syntax, sequence, and checksum together will likley be pretty robust. Additionally you might check that parameter values are in bounds, and of course your UART will have basic parity checking if you choose to use it.

The problem of UART Rx register overrun is better dealt with by testing the UART's overrun flag. UARTs invariably have hardware overrun detection, and interrupt generation on overrun error. If your serial input is interrupt driven, then it seems likely that you are either not enabling and processing the overrun, or that perhaps you are ignoring it and treating it as a normal receive interrupt. If you are not getting an overrun, then the problem is with the FTDI device and the data loss is occurring before it gets to the UART. The last two paragraphs below deal with possible solutions to that problem.

What baud rate does this link run at? In most cases if you are dropping characters on a typical UART data rate then the implementation is flawed. You are may be switching off interrupts inappropriately for too long, doing too much work at the interrupt level, or have inappropriate interrupt priority selection. You need to fix the root cause, and not try to fix a fundamental implementation problem at the protocol level; that is intended to cope with noisy data links, not bad software.

One other possible issue is the FTDI device. I have seen issues with multiple FTDI drivers conflicting and causing data drop-outs. The solution in that case was to use FTDI's FTClean utility to remove the drivers, and then reinstall the latest driver. FTClean appears to be absent from their site though you can obtain it indirectly through a Google search. FTDI's site does have a different removal tool which I guess superseded FTClean. Do you get the same problems with are real serial port? Also I have found USB serial devices using Prolific devices and drivers to be particularly prone to data loss, even at moderate data speeds.

Finally, I have found a number of data drop-out problems using various USB-Serial devices can be solved by "cadencing" the output. Some devices have rather small internal buffers. You may have noticed the character drop-outs start occurring after about 128 characters or whatever the USB device's internal buffer size may be. Inserting short delays (say 10ms) into the data stream can solve this issue. In this case you could simply do that at the end of each line. Another way of 'cadencing' is to poll the transmit buffer in the PC application and wait until it is empty before adding more data, and then only adding data in small chunks, in your case a single line perhaps. I have found that this usually solves data loss problems with no observable loss in data transfer performance.

answered Sep 22 '22 23:09

Clifford

If you can't change the firmware, your options are quite limited for what you can do on the PC to improve the robustness of the link. Just a few possibilities:

Lower the data rate if the firmware allows for that (reduce chance of dropped bytes).
Try to construct packets that don't contain zero or duplicate bytes, if the protocol or functionality has any flexibility to allow for that.
Send messages several times (this makes it more likely the device will receive a good message; this doesn't help to reduce "false positives" however).

If you can change the firmware, then you have much greater potential for improvement: implement a proper CRC (even an 8-bit CRC would be a significant improvement, but 16 bits would be better).

You would be best to implement auto-negotiation in the PC driver so it can talk both the "old" and "new" protocols, and figure out which type of device it's talking to.

answered Sep 22 '22 23:09

Craig McQueen

Related questions
                            
                                Low-Level control of RS232 (Com-Port) RTS/CTS/DTR/DSR possible?
                            
                                Prototyping and simulating embedded software on Windows
                            
                                Is there a way to load a binary file as a const variable in C at compile time
                            
                                what dbus performance issue could prevent it from embedded system?
                            
                                Strange definition in C with @ sign
                            
                                reason 7 - target needs reset -- unreliable debugging setup
                            
                                Is there some tiny perl that I can use in embedded system where the size would matter?
                            
                                Hibernate Embedded/Embeddable not null exception
                            
                                To size your (u)ints or not? [closed]
                            
                                Equivalent for NOP in C for Embedded?
                            
                                How to display an array range via a pointer in the IAR IDE Watch window?
                            
                                Convert ADC Bins into Voltage [closed]
                            
                                How to make your embedded C code immune to requirement changes without adding too much overhead and complexity?
                            
                                ARM Development on Linux [closed]
                            
                                How to achieve zero duty cycle PWM in AVR without glitches?
                            
                                RAM, Heap, and Stack memory for an STM32 board
                            
                                How to determine the cause for "BUS-Error"
                            
                                Can I program an ingenico/verifone terminal to do whatever I want?
                            
                                Do you think you need some simple tutorials on Microcontroller programming?
                            
                                Resources for memory management in embedded application

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With