Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Skipping ahead n codepoints while iterating through a unicode string in Go

Tags:

go

In Go, iterating over a string using

for i := 0; i < len(myString); i++{ 
    doSomething(myString[i])
}

only accesses individual bytes in the string, whereas iterating over a string via

for i, c := range myString{ 
    doSomething(c)
}

iterates over individual Unicode codepoints (calledrunes in Go), which may span multiple bytes.

My question is: how does one go about jumping ahead while iterating over a string with range Mystring? continue can jump ahead by one unicode codepoint, but it's not possible to just do i += 3 for instance if you want to jump ahead three codepoints. So what would be the most idiomatic way to advance forward by n codepoints?

I asked this question on the golang nuts mailing list, and it was answered, courtesy of some of the helpful folks on the list. Someone messaged me however suggesting I create a self-answered question on Stack Overflow for this, to save the next person with the same issue some trouble. That's what this is.

like image 526
LogicChains Avatar asked Feb 13 '23 04:02

LogicChains


1 Answers

I'd consider avoiding the conversion to []rune, and code this directly.

skip := 0
for _, c := range myString {
    if skip > 0 {
        skip--
        continue
    }
    skip = doSomething(c)
}

It looks inefficient to skip runes one by one like this, but it's the same amount of work as the conversion to []rune would be. The advantage of this code is that it avoids allocating the rune slice, which will be approximately 4 times larger than the original string (depending on the number of larger code points you have). Of course converting to []rune is a bit simpler so you may prefer that.

like image 187
Paul Hankin Avatar answered Apr 26 '23 02:04

Paul Hankin