Based on the link below, I'm confused as to whether the Lua programming language supports Unicode. http://lua-users.org/wiki/LuaUnicode It appears it does but has limitations. I simply don't understand, are the limitation anything big/key or not a big deal?

You can certainly store unicode strings in lua, as utf8. You can use these as you would any string. However Lua doesn't provide any default support for higher-level "unicode aware" operations on such strings—e.g., counting string length in characters, converting lower-to-upper-case, etc. Whether this lack is meaningful for you really depends on what you intend to do with these strings. Possible approaches, depending on your use: <ol> <li>If you just want to input/output/store strings, and generally use them as "whole units" (for table indexing etc), you may not need any special handling at all. In this case, you just treat these strings as binary blobs.</li> <li> Due to utf8's clever design, some types of string manipulation can be done on strings containing utf8 and will yield the correct result without taking any special care. For instance, you can append strings, split them apart before/after ascii characters, etc. As an example, if you have a string <code>"開発.txt"</code> and you search for "." in that string using <code>string.find (string_var, ".")</code>, and then split it using the normal <code>string.sub</code> function into <code>"開発"</code> and <code>".txt"</code>, those result strings will be correct utf8 strings even though you're not using any kind of "unicode-aware" algorithm. Similarly, you can do case-conversions on only the ASCII characters in strings (those with the high bit zero), and treat the rest of the strings as binary without screwing them up. </li> <li> Some utf8-aware operations are so simple that it's easy to just write one's own functions to do them. For instance, to calculate the length in unicode-characters of a string, just count the number of characters with the high bit zero (ASCII characters), and the number of characters with the top two bits <code>11</code> ("leading bytes" for non-ASCII characters); the length is the sum of those two. </li> <li>For more complex operations—e.g., case-conversion on non-ASCII characters, etc.—you'll probably have to use a Lua unicode library, such as those on the (previously mentioned) Lua-users Unicode page </li> </ol>

Lua does not have any support for unicode (other than accepting any byte value in strings). The library slnunicode has a lot of unicode string functions, however. For example <code>unicode.utf8.len</code>. (note: this answer is completely stolen from grom's comment on another question - I just think it deserves its own answer)

Does Lua support Unicode?

2 Answers

You can certainly store unicode strings in lua, as utf8. You can use these as you would any string.

However Lua doesn't provide any default support for higher-level "unicode aware" operations on such strings—e.g., counting string length in characters, converting lower-to-upper-case, etc. Whether this lack is meaningful for you really depends on what you intend to do with these strings.

Possible approaches, depending on your use:

If you just want to input/output/store strings, and generally use them as "whole units" (for table indexing etc), you may not need any special handling at all. In this case, you just treat these strings as binary blobs.
Due to utf8's clever design, some types of string manipulation can be done on strings containing utf8 and will yield the correct result without taking any special care.

For instance, you can append strings, split them apart before/after ascii characters, etc. As an example, if you have a string "開発.txt" and you search for "." in that string using string.find (string_var, "."), and then split it using the normal string.sub function into "開発" and ".txt", those result strings will be correct utf8 strings even though you're not using any kind of "unicode-aware" algorithm.

Similarly, you can do case-conversions on only the ASCII characters in strings (those with the high bit zero), and treat the rest of the strings as binary without screwing them up.
Some utf8-aware operations are so simple that it's easy to just write one's own functions to do them.

For instance, to calculate the length in unicode-characters of a string, just count the number of characters with the high bit zero (ASCII characters), and the number of characters with the top two bits 11 ("leading bytes" for non-ASCII characters); the length is the sum of those two.
For more complex operations—e.g., case-conversion on non-ASCII characters, etc.—you'll probably have to use a Lua unicode library, such as those on the (previously mentioned) Lua-users Unicode page

136

answered Sep 19 '22 20:09

snogglethorpe

Lua does not have any support for unicode (other than accepting any byte value in strings). The library slnunicode has a lot of unicode string functions, however. For example unicode.utf8.len.

(note: this answer is completely stolen from grom's comment on another question - I just think it deserves its own answer)

answered Sep 23 '22 20:09

Johannes Hoff

Related questions
                            
                                non static method cannot be referenced from a static context [duplicate]
                            
                                Type hinting in Eclipse with PyDev
                            
                                How to return dynamic CSS with ASP.NET MVC?
                            
                                why do game engines prefer static libraries over dynamic link libraries
                            
                                A simple Python deployment problem - a whole world of pain
                            
                                Check if in Integrated Pipeline Mode
                            
                                jQuery and MooTools Conflict
                            
                                When is it safe to use a broken hash function?
                            
                                Get client timezone (not GMT offset amount) in JS [duplicate]
                            
                                Tomcat version within JBoss?
                            
                                How can i zip files in Java and not include files paths
                            
                                Does referencing a variable in a php function help save memory?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Does Lua support Unicode?

Tags:

TimK

People also ask

2 Answers

snogglethorpe

Johannes Hoff

Recent Activity

Donate For Us