Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Umlauts and slices

Tags:

go

I'm having some trouble while reading a file which has a fixed column length format. Some columns may contain umlauts.

Umlauts seem to use 2 bytes instead of one. This is not the behaviour I was expecting. Is there any kind of function which returns a substring? Slice does not seem to work in this case.

Here's some sample code:

http://play.golang.org/p/ZJ1axy7UXe

umlautsString := "Rhön"
fmt.Println(len(umlautsString))
fmt.Println(umlautsString[0:4])

Prints:

5
Rhö
like image 501
fourcube Avatar asked Oct 17 '13 16:10

fourcube


1 Answers

In go, a slice of a string counts bytes, not runes. This is why "Rhön"[0:3] gives you Rh and the first byte of ö.

Characters encoded in UTF-8 are represented as runes because UTF-8 encodes characters in more than one byte (up to four bytes) to provide a bigger range of characters.

If you want to slice a string with the [] syntax, convert the string to []rune before. Example (on play):

umlautsString := "Rhön"
runes = []rune(umlautsString)
fmt.Println(string(runes[0:3])) // Rhö

Noteworthy: This golang blog post about string representation in go.

like image 70
nemo Avatar answered Oct 21 '22 11:10

nemo