I am trying to generate UTF-8 QRCode so that I can encore accents and Unicode characters. To test it, I am using many decoding solution : <ol> <li> http://zxing.org/w/decode.jspx - The zxing project also used in Android</li> <li> http://www.drhu.org/QRCode/QRDecoder.php - a PHP Decoder</li> <li> http://zbar.sf.net - The ZBar bar code reader - OpenSource and C project for embedded</li> </ol> All of them give me always the same result. You can try this image works well with Unicode Characters. But if I am trying to use zxing or Google Chart API to generate the QRCode, I cannot decode it correctly. I have tried this : <ol> <li>http://chart.apis.google.com/chart?cht=qr&chs=200x200&choe=SHIFT_JIS&chl=R%C3%A9my+Hubscher</li> <li>http://chart.apis.google.com/chart?cht=qr&chs=200x200&choe=ISO-8859-1&chl=R%C3%A9my+Hubscher</li> <li>http://chart.apis.google.com/chart?cht=qr&chs=200x200&choe=UTF-8&chl=R%C3%A9my+Hubscher</li> </ol> But all without success. Do you know how I can do ? Do you know which encoding is used for the working image ?

The solution that comes up, is to encode the text in UTF-8 and add a BOM to specify that the string is actually in UTF-8. Here it works : <ul> <li>http://chart.apis.google.com/chart?cht=qr&chs=200x200&choe=UTF-8&chl=%EF%BB%BFR%C3%A9my+Hubscher</li> </ul>

Unicode Encoding and decoding issues in QRCode

2 Answers

Heuristics used by QR decoders often fails, BOM does not help

Most QR decoders use heuristics to automatically detect character encoding even if it is specified explicitly inside the QR code via the ECI extension.

It turned out that BOM helped to your decoder. But for most decoders, BOM does not help. As an example of a decoder that cannot display a proper UTF-8 string, take a Xiaomi phone with MIUI Global v11.0.3 (with their native scanner application). This phone cannot correctly show an UTF-8 QR code produced a link in your original question. Here is how it showed: R閙y Hubscher. With the BOM (using a link from your subsequent message) it showed this way: ?R閙y Hubscher (it just showed the BOM character as ?). But if you add a Chinese character like 日 before the string instead of BOM, Xiaomi will show the string correctly. Here is the link: chart.apis.google.com/chart?cht=qr&chs=200x200&choe=UTF-8&chl=%E6%97%A5R%C3%A9my%20Hubscher Xiaomi correctly displays the string 日Rémy Hubscher from a QR code generated by this link.

Another example is “QR code reader & QR code Scanner” Android app by TWMobile. It did properly decode all the QR codes from all the links that you have provided. So you did not have to use BOM to make the scanner by TWMobile properly display the strings.

Why do QR decoders always use heuristics to detect character set even though these heuristics frequently fails as shown in your case? As you know, there are 4 modes of storing text in a QR code: (1) numeric, (2) alphanumeric, (3) 8-bit, and (4) Kanji. So, QR code standard does not inherently support UTF-8. To use UTF-8 encoding (instead of the default “ISO-8859-1” or “JIS8”) in the 8-bit string, the implementation has to insert an ECI (Extended Channel Interpretations) before that string. ECI is an optional, additional feature for a QR Code. Good point is that it was defined in earliest QR code standard at least in 2000. ECI enables data encoding using character sets other than the default. It also enables other data interpretations (e.g. compacted data using defined compression schemes) or other industry-specific requirements to be encoded. The ECI protocol is defined in a specification developed by AIM, Inc, and is not available for free but can be purchased for a fee. Unfortunately, not all QR decoders can handle the ECI protocol, even in such a basic thing as changing default encoding to UTF-8. And even for default encoding like “ISO-8859-1” (for a 8-bit string mode) or “Shift_JIS”(for Kanji mode), decoders still use heuristics to determine character set, because some applications that encode QR codes may not support ECI or specify incorrect character set.

Conclusion

Because of heuristics to automatically detect character set, QR decoders often fail do display the string properly, even when correct encoding is explicitly specified via ECI as it was in your case and the BOM character did not help as shown in the Xiaomi example. You have found a solution in your reply, but it did not help for Xiaomi. Some QR decoders use heuristics algorithms that are so dumb that even BOM does not help.

Although the BOM did help with your QR decoder, a better solution would be to stop using error-prone QR decoders that use heuristics even if the character encoding is explicitly specified via ECI.

Find a better QR decoder if a decoder cannot properly decode the text without BOM. The encoder that you have provided (using the links) is OK.

answered Oct 07 '22 07:10

Maxim Masiutin

The solution that comes up, is to encode the text in UTF-8 and add a BOM to specify that the string is actually in UTF-8.

Here it works :

http://chart.apis.google.com/chart?cht=qr&chs=200x200&choe=UTF-8&chl=%EF%BB%BFR%C3%A9my+Hubscher

answered Oct 07 '22 06:10

Natim

Related questions
                            
                                How can I create an alphanumeric Regex for all languages?
                            
                                Is it safe to assume users can see unicode characters U+2716 and U+2714 in CSS content?
                            
                                Select unicode character u2028 in mysql 5.1
                            
                                Why is TextView showing the unicode right arrow (\u2192) at the bottom line?
                            
                                R write.csv with UTF-16 encoding
                            
                                Why does Java use modified UTF-8 instead of UTF-8? [closed]
                            
                                How to convert Unicode Character to Int in Swift
                            
                                How to Output Unicode Strings on the Windows Console
                            
                                string.decode() vs. unicode(string)
                            
                                Efficiently list all characters in a given Unicode category
                            
                                Python length of unicode string confusion
                            
                                How to make the Java.awt.Robot type unicode characters? (Is it possible?)
                            
                                How to print tuples of unicode strings in original language (not u'foo' form)
                            
                                How do you match accented and tilde characters in a perl regular expression (regexp)?
                            
                                Why do I need to escape unicode in java source files?
                            
                                Python: solving unicode hell with unidecode
                            
                                Loading special characters with PyYaml
                            
                                What is the efficient, standards-compliant mechanism for processing Unicode using C++17?
                            
                                Unicode characters not showing in System.Windows.Forms.TextBox
                            
                                Java Can't Open a File with Surrogate Unicode Values in the Filename?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Unicode Encoding and decoding issues in QRCode

Tags:

character-encoding

encoding

unicode

decoding

qr-code

Natim

People also ask

2 Answers

Heuristics used by QR decoders often fails, BOM does not help

Conclusion

Maxim Masiutin

Natim

Recent Activity

Donate For Us