Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is the way to represent a unichar in lua

Tags:

unicode

lua

If I need to have the following python value, unicode char '0':

>>> unichr(0)
u'\x00'

How can I define it in Lua?

like image 349
Tzury Bar Yochay Avatar asked Oct 15 '11 19:10

Tzury Bar Yochay


2 Answers

There isn't one.

Lua has no concept of a Unicode value. Lua has no concept of Unicode at all. All Lua strings are 8-bit sequences of "characters", and all Lua string functions will treat them as such. Lua does not treat strings as having any Unicode encoding; they're just a sequence of bytes.

You can insert an arbitrary number into a string. For example:

"\065\066"

Is equivalent to:

"AB"

The \ notation is followed by 3 digits (or one of the escape characters), which must be less than or equal to 255. Lua is perfectly capable of handling strings with embedded \000 characters.

But you cannot directly insert Unicode codepoints into Lua strings. You can decompose the codepoint into UTF-8 and use the above mechanism to insert the codepoint into a string. For example:

"x\226\131\151"

This is the x character followed by the Unicode combining above arrow character.

But since no Lua functions actually understand UTF-8, you will have to expose some function that expects a UTF-8 string in order for it to be useful in any way.

like image 119
Nicol Bolas Avatar answered Sep 29 '22 01:09

Nicol Bolas


How about

function unichr(ord)
    if ord == nil then return nil end
    if ord < 32 then return string.format('\\x%02x', ord) end
    if ord < 126 then return string.char(ord) end
    if ord < 65539 then return string.format("\\u%04x", ord) end
    if ord < 1114111 then return string.format("\\u%08x", ord) end
end
like image 22
Tzury Bar Yochay Avatar answered Sep 29 '22 00:09

Tzury Bar Yochay