As example: I want remove the first 2 letters from the string "ПРИВЕТ" and "HELLO." one of these are containing only two-byted unicode symbols.
Trying to use string.sub("ПРИВЕТ") and string.sub("HELLO.")
Got "РИВЕТ" and "LLO.".
string.sub() removed 2 BYTES(not chars) from these strings. So i want to know how to get the removing of the chars
Something, like utf8.sub()
The key standard function for this task is utf8.offset(s,n)
, which gives the position in bytes of the start of the n-th character of s.
So try this:
print(string.sub(s,utf8.offset(s,3),-1))
You can define utf8.sub
as follows:
function utf8.sub(s,i,j)
i=utf8.offset(s,i)
j=utf8.offset(s,j+1)-1
return string.sub(s,i,j)
end
(This code only works for positive j
. See http://lua-users.org/lists/lua-l/2014-04/msg00590.html for the general case.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With