Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the simplest way to get UTF-8 substring in Julia

Tags:

utf-8

julia

UTF-8 string in Julia cannot use slice operator because it slice the byte index of string not character. For example

s = "ポケットモンスター"
s[1:4]

s[1:4] will be "ポケ" not "ポケット".

I would like to know the simplest and most readable for get UTF-8 sub-string in Julia.

like image 638
Pisit Makpaisit Avatar asked Feb 08 '23 21:02

Pisit Makpaisit


2 Answers

Perhaps this question calls attention to some missing functions in the standard string library (which is supposed to undergo changes in the next version of Julia). In the meantime, if we define:

substr(s,i,j) = s[chr2ind(s,i):chr2ind(s,j)]

Then,

substr(s,1,4)

Would be "ポケット"

like image 57
Dan Getz Avatar answered Feb 15 '23 03:02

Dan Getz


You might want to consider using UTF32String instead of UTF8String, if you are going to be doing this a lot, and only converting to UTF8String if necessary, when you are finished.

like image 43
Scott Jones Avatar answered Feb 15 '23 04:02

Scott Jones