What is the simplest way to get UTF-8 substring in Julia

Question

UTF-8 string in Julia cannot use slice operator because it slice the byte index of string not character. For example

s = "ポケットモンスター"
s[1:4]

s[1:4] will be "ポケ" not "ポケット".

I would like to know the simplest and most readable for get UTF-8 sub-string in Julia.

Dan Getz · Accepted Answer

Perhaps this question calls attention to some missing functions in the standard string library (which is supposed to undergo changes in the next version of Julia). In the meantime, if we define:

substr(s,i,j) = s[chr2ind(s,i):chr2ind(s,j)]

Then,

substr(s,1,4)

Would be "ポケット"

Scott Jones · Answer

You might want to consider using UTF32String instead of UTF8String, if you are going to be doing this a lot, and only converting to UTF8String if necessary, when you are finished.

What is the simplest way to get UTF-8 substring in Julia

Tags:

utf-8

julia

Pisit Makpaisit

2 Answers

Dan Getz

Scott Jones

Recent Activity

Donate For Us

What is the simplest way to get UTF-8 substring in Julia

Tags:

utf-8

julia

Pisit Makpaisit

2 Answers

Dan Getz

Scott Jones

Related questions

Recent Activity

Donate For Us