I'm working on an existing driver that controls an 8-bit MCU over a serial port. There are many different flavors of firmware for the MCU but they all share a common method of trying to ensure link integrity. This method is not very robust and I am looking for ideas about how the driver could modify its behavior to get the most out of it.
Commands are gcode with a line number and a checksum:
N3 T0*57
N4 G92 E0*67
N5 G28*22
N6 G1 F1500.0*82
N7 G1 X2.0 Y2.0 F3000.0*85
N8 G1 X3.0 Y3.0*33
The line number must be sequential (but can be reset with M110
). If the checksum doesn't match or the line number is out of sequence the firmware will reply Resend: nnn
where nnn
is the last successful N
plus 1. The "checksum" is extremely primitive:
// Calc checksum.
byte checksum = 0;
byte count = 0;
while(instruction[count] != '*')
checksum = checksum^instruction[count++];
The main problem is that the primary error mechanism is dropped bytes due to interrupts being held off leading to overflows in the 1-byte MCU FIFO. The actual serial bus is a few cm between an FTDI (or similar) USB serial bridge and the MCU, so bit errors are unlikely. I've never observed a bit error in a reply from the firmware.
As you can see, the algorithm above would detect one dropped byte, but if you dropped two of the same byte (anywhere!) the result would still match. Thus F3000.0
(feedrate 3000mm/min) could be transmuted to F30.0
and still match. Plus, the character set is very small so certain bits are never going to be involved.
Is there anything the driver can do to make a given line more robust?
X1 Y1
is the same as Y1 X1
)F2999.9
instead of F3000
)N
for a given lineG1 X2
becomes G1 X1
G1 X2
assuming X=0 initially)T0
is meaningless for most commands, and if you send F3000
once it is implied in the future so it can be optionally sent or not)If I believe that the firmware drops bytes in groups then the most important thing could be to avoid back-to-back duplicates like 00
which would (if dropped together) be invisible.
The SHA family of algorithms is published by the National Institute of Standards and Technology. One algorithm, SHA-1, produces a 160-bit checksum and is the best-performing checksum, followed by the 256-bit and 512-bit versions.
When you enter a key part, you enter two key part halves and a checksum for the key part. The checksum is a two-digit number you calculate using the key part and the checksum algorithm. When you enter the key part and the checksum, ICSF calculates the checksum for the key part you entered.
The simplest checksum algorithm is the so-called longitudinal parity check, which breaks the data into "words" with a fixed number n of bits, and then computes the exclusive or (XOR) of all those words. The result is appended to the message as an extra word.
One thing you could try is to configure the UART on the host to send 2 stop bits instead of 1 (which is what you're probably currently using). The MCU receiver will notice nothing except that there's an extra bit-time of idle between characters. That's about 10% more time to get the character out of the receive register before the next character is shifted in.
Typically, a UART doesn't look for more than 1 stop bit on receive data, even if the UART is configured for 2 stop bits (there's no reason to enforce extra stop bits on receive), so the fact that the MCU will still send only a single stop bit shouldn't cause any problems on receiving responses from the device.
If you're at a high data rate this doesn't add much time, so it probably won't help (but that depends on what's the underlying cause of the overruns is). If my ciphering is right, it works out to another 25 microseconds for the MCU to avoid an overrun if you're running the link at 38400 bps.
It's a long shot, but it's a cheap change that should require no modifications other than the serial port configuration on the host side.
We may not all be familiar with G-Code, a link is always helpful for domain specific technology.
I would say that the simple checksum is probably adequately suited to the data length, format, and the processor performance. If you are already dropping characters, you hardly want to add more CPU load with a CRC do you?
You have multiple lines of defence in this protocol. The data has to be well formed, in sequence, and pass a checksum, it also appears to have a fairly limited valid character set. So checking syntax, sequence, and checksum together will likley be pretty robust. Additionally you might check that parameter values are in bounds, and of course your UART will have basic parity checking if you choose to use it.
The problem of UART Rx register overrun is better dealt with by testing the UART's overrun flag. UARTs invariably have hardware overrun detection, and interrupt generation on overrun error. If your serial input is interrupt driven, then it seems likely that you are either not enabling and processing the overrun, or that perhaps you are ignoring it and treating it as a normal receive interrupt. If you are not getting an overrun, then the problem is with the FTDI device and the data loss is occurring before it gets to the UART. The last two paragraphs below deal with possible solutions to that problem.
What baud rate does this link run at? In most cases if you are dropping characters on a typical UART data rate then the implementation is flawed. You are may be switching off interrupts inappropriately for too long, doing too much work at the interrupt level, or have inappropriate interrupt priority selection. You need to fix the root cause, and not try to fix a fundamental implementation problem at the protocol level; that is intended to cope with noisy data links, not bad software.
One other possible issue is the FTDI device. I have seen issues with multiple FTDI drivers conflicting and causing data drop-outs. The solution in that case was to use FTDI's FTClean utility to remove the drivers, and then reinstall the latest driver. FTClean appears to be absent from their site though you can obtain it indirectly through a Google search. FTDI's site does have a different removal tool which I guess superseded FTClean. Do you get the same problems with are real serial port? Also I have found USB serial devices using Prolific devices and drivers to be particularly prone to data loss, even at moderate data speeds.
Finally, I have found a number of data drop-out problems using various USB-Serial devices can be solved by "cadencing" the output. Some devices have rather small internal buffers. You may have noticed the character drop-outs start occurring after about 128 characters or whatever the USB device's internal buffer size may be. Inserting short delays (say 10ms) into the data stream can solve this issue. In this case you could simply do that at the end of each line. Another way of 'cadencing' is to poll the transmit buffer in the PC application and wait until it is empty before adding more data, and then only adding data in small chunks, in your case a single line perhaps. I have found that this usually solves data loss problems with no observable loss in data transfer performance.
If you can't change the firmware, your options are quite limited for what you can do on the PC to improve the robustness of the link. Just a few possibilities:
If you can change the firmware, then you have much greater potential for improvement: implement a proper CRC (even an 8-bit CRC would be a significant improvement, but 16 bits would be better).
You would be best to implement auto-negotiation in the PC driver so it can talk both the "old" and "new" protocols, and figure out which type of device it's talking to.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With