I'm having some trouble while reading a file which has a fixed column length format. Some columns may contain umlauts.
Umlauts seem to use 2 bytes instead of one. This is not the behaviour I was expecting. Is there any kind of function which returns a substring? Slice does not seem to work in this case.
Here's some sample code:
http://play.golang.org/p/ZJ1axy7UXe
umlautsString := "Rhön"
fmt.Println(len(umlautsString))
fmt.Println(umlautsString[0:4])
Prints:
5
Rhö
In go, a slice of a string counts bytes, not runes. This is why "Rhön"[0:3]
gives you Rh
and the first byte of ö
.
Characters encoded in UTF-8 are represented as runes because UTF-8 encodes characters in more than one byte (up to four bytes) to provide a bigger range of characters.
If you want to slice a string with the []
syntax, convert the string to []rune
before.
Example (on play):
umlautsString := "Rhön"
runes = []rune(umlautsString)
fmt.Println(string(runes[0:3])) // Rhö
Noteworthy: This golang blog post about string representation in go.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With