Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Making the best of a bad "checksum" algorithm

I'm working on an existing driver that controls an 8-bit MCU over a serial port. There are many different flavors of firmware for the MCU but they all share a common method of trying to ensure link integrity. This method is not very robust and I am looking for ideas about how the driver could modify its behavior to get the most out of it.

Commands are gcode with a line number and a checksum:

N3 T0*57
N4 G92 E0*67
N5 G28*22
N6 G1 F1500.0*82
N7 G1 X2.0 Y2.0 F3000.0*85
N8 G1 X3.0 Y3.0*33

The line number must be sequential (but can be reset with M110). If the checksum doesn't match or the line number is out of sequence the firmware will reply Resend: nnn where nnn is the last successful N plus 1. The "checksum" is extremely primitive:

        // Calc checksum.
        byte checksum = 0;
        byte count = 0;
        while(instruction[count] != '*')
                checksum = checksum^instruction[count++];

The main problem is that the primary error mechanism is dropped bytes due to interrupts being held off leading to overflows in the 1-byte MCU FIFO. The actual serial bus is a few cm between an FTDI (or similar) USB serial bridge and the MCU, so bit errors are unlikely. I've never observed a bit error in a reply from the firmware.

As you can see, the algorithm above would detect one dropped byte, but if you dropped two of the same byte (anywhere!) the result would still match. Thus F3000.0 (feedrate 3000mm/min) could be transmuted to F30.0 and still match. Plus, the character set is very small so certain bits are never going to be involved.

Is there anything the driver can do to make a given line more robust?

  • Add or drop trailing (or even leading) zeroes
  • Add or drop spaces
  • Re-order words (X1 Y1 is the same as Y1 X1)
  • Add or drop spaces
  • Make "insignificant" modifications to values within some tolerance (eg F2999.9 instead of F3000)
  • Reset the line number to get some specific N for a given line
  • Break a single command into multiple equivalent commands (eg G1 X2 becomes G1 X1 G1 X2 assuming X=0 initially)
  • Eliminate (or add) superfluous words (eg T0 is meaningless for most commands, and if you send F3000 once it is implied in the future so it can be optionally sent or not)

If I believe that the firmware drops bytes in groups then the most important thing could be to avoid back-to-back duplicates like 00 which would (if dropped together) be invisible.

like image 475
Ben Jackson Avatar asked Apr 05 '11 18:04

Ben Jackson


People also ask

What is the best checksum algorithm?

The SHA family of algorithms is published by the National Institute of Standards and Technology. One algorithm, SHA-1, produces a 160-bit checksum and is the best-performing checksum, followed by the 256-bit and 512-bit versions.

What is checksum algorithm?

When you enter a key part, you enter two key part halves and a checksum for the key part. The checksum is a two-digit number you calculate using the key part and the checksum algorithm. When you enter the key part and the checksum, ICSF calculates the checksum for the key part you entered.

What is an XOR checksum?

The simplest checksum algorithm is the so-called longitudinal parity check, which breaks the data into "words" with a fixed number n of bits, and then computes the exclusive or (XOR) of all those words. The result is appended to the message as an extra word.


3 Answers

One thing you could try is to configure the UART on the host to send 2 stop bits instead of 1 (which is what you're probably currently using). The MCU receiver will notice nothing except that there's an extra bit-time of idle between characters. That's about 10% more time to get the character out of the receive register before the next character is shifted in.

Typically, a UART doesn't look for more than 1 stop bit on receive data, even if the UART is configured for 2 stop bits (there's no reason to enforce extra stop bits on receive), so the fact that the MCU will still send only a single stop bit shouldn't cause any problems on receiving responses from the device.

If you're at a high data rate this doesn't add much time, so it probably won't help (but that depends on what's the underlying cause of the overruns is). If my ciphering is right, it works out to another 25 microseconds for the MCU to avoid an overrun if you're running the link at 38400 bps.

It's a long shot, but it's a cheap change that should require no modifications other than the serial port configuration on the host side.

like image 83
Michael Burr Avatar answered Sep 24 '22 23:09

Michael Burr


We may not all be familiar with G-Code, a link is always helpful for domain specific technology.

I would say that the simple checksum is probably adequately suited to the data length, format, and the processor performance. If you are already dropping characters, you hardly want to add more CPU load with a CRC do you?

You have multiple lines of defence in this protocol. The data has to be well formed, in sequence, and pass a checksum, it also appears to have a fairly limited valid character set. So checking syntax, sequence, and checksum together will likley be pretty robust. Additionally you might check that parameter values are in bounds, and of course your UART will have basic parity checking if you choose to use it.

The problem of UART Rx register overrun is better dealt with by testing the UART's overrun flag. UARTs invariably have hardware overrun detection, and interrupt generation on overrun error. If your serial input is interrupt driven, then it seems likely that you are either not enabling and processing the overrun, or that perhaps you are ignoring it and treating it as a normal receive interrupt. If you are not getting an overrun, then the problem is with the FTDI device and the data loss is occurring before it gets to the UART. The last two paragraphs below deal with possible solutions to that problem.

What baud rate does this link run at? In most cases if you are dropping characters on a typical UART data rate then the implementation is flawed. You are may be switching off interrupts inappropriately for too long, doing too much work at the interrupt level, or have inappropriate interrupt priority selection. You need to fix the root cause, and not try to fix a fundamental implementation problem at the protocol level; that is intended to cope with noisy data links, not bad software.

One other possible issue is the FTDI device. I have seen issues with multiple FTDI drivers conflicting and causing data drop-outs. The solution in that case was to use FTDI's FTClean utility to remove the drivers, and then reinstall the latest driver. FTClean appears to be absent from their site though you can obtain it indirectly through a Google search. FTDI's site does have a different removal tool which I guess superseded FTClean. Do you get the same problems with are real serial port? Also I have found USB serial devices using Prolific devices and drivers to be particularly prone to data loss, even at moderate data speeds.

Finally, I have found a number of data drop-out problems using various USB-Serial devices can be solved by "cadencing" the output. Some devices have rather small internal buffers. You may have noticed the character drop-outs start occurring after about 128 characters or whatever the USB device's internal buffer size may be. Inserting short delays (say 10ms) into the data stream can solve this issue. In this case you could simply do that at the end of each line. Another way of 'cadencing' is to poll the transmit buffer in the PC application and wait until it is empty before adding more data, and then only adding data in small chunks, in your case a single line perhaps. I have found that this usually solves data loss problems with no observable loss in data transfer performance.

like image 24
Clifford Avatar answered Sep 22 '22 23:09

Clifford


If you can't change the firmware, your options are quite limited for what you can do on the PC to improve the robustness of the link. Just a few possibilities:

  • Lower the data rate if the firmware allows for that (reduce chance of dropped bytes).
  • Try to construct packets that don't contain zero or duplicate bytes, if the protocol or functionality has any flexibility to allow for that.
  • Send messages several times (this makes it more likely the device will receive a good message; this doesn't help to reduce "false positives" however).

If you can change the firmware, then you have much greater potential for improvement: implement a proper CRC (even an 8-bit CRC would be a significant improvement, but 16 bits would be better).

You would be best to implement auto-negotiation in the PC driver so it can talk both the "old" and "new" protocols, and figure out which type of device it's talking to.

like image 41
Craig McQueen Avatar answered Sep 22 '22 23:09

Craig McQueen