UTF-8 string in Julia cannot use slice operator because it slice the byte index of string not character. For example
s = "ポケットモンスター"
s[1:4]
s[1:4] will be "ポケ" not "ポケット".
I would like to know the simplest and most readable for get UTF-8 sub-string in Julia.
Perhaps this question calls attention to some missing functions in the standard string library (which is supposed to undergo changes in the next version of Julia). In the meantime, if we define:
substr(s,i,j) = s[chr2ind(s,i):chr2ind(s,j)]
Then,
substr(s,1,4)
Would be "ポケット"
You might want to consider using UTF32String
instead of UTF8String
, if you are going to be doing this a lot, and only converting to UTF8String
if necessary, when you are finished.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With