Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The length of Arabic letters in Lua

In Lua language when I want to get the length of a single Arabic letter (such as "ف"), the answer will be 2!

Ex.

local letter = "ف"
print( letter:len() )

Output: 2

The same problem occur when I use (string.sub(a,b)). If I want to print the first letter of an Arabic word, I can't say (string.sub(1,1).

Ex.

local word_1 = "فولت"
print( word_1:sub(1,2) )

Output: ف
as you saw I put the second argument (2) not (1) to get the correct answer.
if I put the first argument 1 the answer will be:

print( word_1:sub(1,1) )

Output: Ù

Why does Lua count the length of a single Arabic letter as a two?

And is there a way to get the right length which is 1?

like image 260
Ali Avatar asked Jan 15 '14 13:01

Ali


People also ask

How long is the Arabic alphabet?

The Arabic alphabet, called Al-abjadiyah, has 28 letters. All 28 letters are consonants, and most letters have four different forms. Vowels do exist in Arabic – but we'll explain all about Arabic letter forms and vowels a bit later on!

What are Lua strings?

Strings have the usual meaning: a sequence of characters. Lua is eight-bit clean and so strings may contain characters with any numeric value, including embedded zeros. That means that you can store any binary data into a string. Strings in Lua are immutable values.


1 Answers

Lua 5.3 is released now. It provides a basic UTF-8 library.

utf8.len can be used to get the length of a UTF-8 string:

print(utf8.len("ف"))
-- 1
like image 156
Yu Hao Avatar answered Oct 10 '22 19:10

Yu Hao