Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Indexing string as chars

Tags:

string

unicode

go

The elements of strings have type byte and may be accessed using the usual indexing operations.

How can I get element of string as char ?

"some"[1] -> "o"

like image 294
ceth Avatar asked Oct 29 '12 10:10

ceth


2 Answers

The simplest solution is to convert it to an array of runes :

var runes = []rune("someString")

Note that when you iterate on a string, you don't need the conversion. See this example from Effective Go :

for pos, char := range "日本語" {
    fmt.Printf("character %c starts at byte position %d\n", char, pos)
}

This prints

character 日 starts at byte position 0
character 本 starts at byte position 3
character 語 starts at byte position 6
like image 184
Denys Séguret Avatar answered Oct 22 '22 16:10

Denys Séguret


Go strings are usually, but not necessarily, UTF-8 encoded. In the case they are Unicode strings, the term "char[acter]" is pretty complex and there is no generall/unique bijection of runes (code points) and Unicode characters.

Anyway one can easily work with code points (runes) in a slice and use indexes into it using a conversion:

package main

import "fmt"

func main() {
        utf8 := "Hello, 世界"
        runes := []rune(utf8)
        fmt.Printf("utf8:% 02x\nrunes: %#v\n", []byte(utf8), runes)
}

Also here: http://play.golang.org/p/qWVSA-n93o

Note: Often the desire to access Unicode "characters" by index is a design mistake. Most of textual data is processed sequentially.

like image 34
zzzz Avatar answered Oct 22 '22 14:10

zzzz