Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Lua support Unicode?

Tags:

Based on the link below, I'm confused as to whether the Lua programming language supports Unicode.

http://lua-users.org/wiki/LuaUnicode

It appears it does but has limitations. I simply don't understand, are the limitation anything big/key or not a big deal?

like image 500
TimK Avatar asked Mar 23 '10 05:03

TimK


People also ask

Does Lua support utf8?

lua] supports all 5.3 string functions for UTF-8. Tested on Lua 5.1 and Lua 5.3 and LuaJIT. [ustring] provides a pure-Lua implementation of a UTF-8 version of each of the functions in the string library except string.

What encoding does Lua use?

UTF-8 is a popular character encoding scheme that allows to represent strings as sequence of code points defined in Unicode standard.

Can Javascript read Unicode?

Unicode in Javascript source codeIn Javascript, the identifiers and string literals can be expressed in Unicode via a Unicode escape sequence. The general syntax is \uXXXX , where X denotes four hexadecimal digits. For example, the letter o is denoted as '\u006F' in Unicode.


2 Answers

You can certainly store unicode strings in lua, as utf8. You can use these as you would any string.

However Lua doesn't provide any default support for higher-level "unicode aware" operations on such strings—e.g., counting string length in characters, converting lower-to-upper-case, etc. Whether this lack is meaningful for you really depends on what you intend to do with these strings.

Possible approaches, depending on your use:

  1. If you just want to input/output/store strings, and generally use them as "whole units" (for table indexing etc), you may not need any special handling at all. In this case, you just treat these strings as binary blobs.

  2. Due to utf8's clever design, some types of string manipulation can be done on strings containing utf8 and will yield the correct result without taking any special care.

    For instance, you can append strings, split them apart before/after ascii characters, etc. As an example, if you have a string "開発.txt" and you search for "." in that string using string.find (string_var, "."), and then split it using the normal string.sub function into "開発" and ".txt", those result strings will be correct utf8 strings even though you're not using any kind of "unicode-aware" algorithm.

    Similarly, you can do case-conversions on only the ASCII characters in strings (those with the high bit zero), and treat the rest of the strings as binary without screwing them up.

  3. Some utf8-aware operations are so simple that it's easy to just write one's own functions to do them.

    For instance, to calculate the length in unicode-characters of a string, just count the number of characters with the high bit zero (ASCII characters), and the number of characters with the top two bits 11 ("leading bytes" for non-ASCII characters); the length is the sum of those two.

  4. For more complex operations—e.g., case-conversion on non-ASCII characters, etc.—you'll probably have to use a Lua unicode library, such as those on the (previously mentioned) Lua-users Unicode page

like image 136
snogglethorpe Avatar answered Sep 19 '22 20:09

snogglethorpe


Lua does not have any support for unicode (other than accepting any byte value in strings). The library slnunicode has a lot of unicode string functions, however. For example unicode.utf8.len.

(note: this answer is completely stolen from grom's comment on another question - I just think it deserves its own answer)

like image 20
Johannes Hoff Avatar answered Sep 23 '22 20:09

Johannes Hoff