Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Encoding UTF-8 characters in QR Code symbols

Since iOS 7, it is possible to generate a QR Code via the CIFilter named CIQRCodeGenerator of Core Image framework.

By looking the documentation, Apple indicates that strings used to generate QR Code must be encoded with NSISOLatin1StringEncoding.

To create a QR code from a string or URL, convert it to an NSData object using the NSISOLatin1StringEncoding string encoding.

However, I tried to encode Chinese characters with NSUTF8StringEncoding and it works pretty well. Do you think I can have problems by using NSUTF8StringEncoding? Are there any known issues?

like image 324
Sébastien MICHOY Avatar asked Sep 10 '25 01:09

Sébastien MICHOY


1 Answers

What follows is general advice and somebody that has knowledge of the Core Image framework may be able to provide a more specific answer. Nevertheless, I hope it clarifies why the library provides such specific encoding advice, the likely consequences of ignoring that advice, and how you might nevertheless encode characters that are not available through Latin-1.

In general, the ISO/IEC 18004 standard for QR Code ("QR Code 2005"), and all other international standards for 2D barcodes, specify that the Latin-1 character encoding must be used when interpreting the QR Code byte sequence returned by readers, except where an Extended Channel Interpretation (ECI) sequence specifying an alternative character encoding has been provided in the data.

It is however so common for users to encode the data using UTF-8 that in practise most barcode readers use a proprietary heuristic to guess whether the content is encoded according in some other encoding than Latin-1, such as UTF-8. In many cases this leads to ambiguity and will result in misreads especially when arbitrary data is used in open applications.

If you intend to be rigorous and it is required that the data be encoded using UTF-8 then it is necessary for the encoding library to support setting ECI 000026 before the UTF-8 data.

Edit 2020: I have produced a detail article describing precisely this issue and the work that is currently being undertaken by the standards bodies to promote the use of ECI: https://www.linkedin.com/pulse/enhanced-channel-interpretation-terry-burton/

The register of assigned ECI codes is available from the AIM store as "ECI Part 3: Register" for a fee.

[*] With CIQRCodeGenerator this does not appear to be the case.

like image 113
Terry Burton Avatar answered Sep 12 '25 17:09

Terry Burton