As example: I want remove the first 2 letters from the string "ПРИВЕТ" and "HELLO." one of these are containing only two-byted unicode symbols.
Trying to use string.sub("ПРИВЕТ") and string.sub("HELLO.")
Got "РИВЕТ" and "LLO.".
string.sub() removed 2 BYTES(not chars) from these strings. So i want to know how to get the removing of the chars
Something, like utf8.sub()
The key standard function for this task is utf8.offset(s,n), which gives the position in bytes of the start of the n-th character of s.
So try this:
print(string.sub(s,utf8.offset(s,3),-1))
You can define utf8.sub as follows:
function utf8.sub(s,i,j)
    i=utf8.offset(s,i)
    j=utf8.offset(s,j+1)-1
    return string.sub(s,i,j)
end
(This code only works for positive j. See http://lua-users.org/lists/lua-l/2014-04/msg00590.html for the general case.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With