I need to read Unicode files that may or may not contain a byte-order mark. I could of course check the first few bytes of the file myself, and discard a BOM if I find one. But before I do, is there any standard way of doing this, either in the core libraries or a third party?
I thought I would add here the way to strip the Byte Order Mark sequence from a string -- rather than messing around with bytes directly (as shown above).
package main
import (
"fmt"
"strings"
)
func main() {
s := "\uFEFF is a string that starts with a Byte Order Mark"
fmt.Printf("before: '%v' (len=%v)\n", s, len(s))
ByteOrderMarkAsString := string('\uFEFF')
if strings.HasPrefix(s, ByteOrderMarkAsString) {
fmt.Printf("Found leading Byte Order Mark sequence!\n")
s = strings.TrimPrefix(s, ByteOrderMarkAsString)
}
fmt.Printf("after: '%v' (len=%v)\n", s, len(s))
}
Other "strings" functions should work as well.
And this is what prints out:
before: ' is a string that starts with a Byte Order Mark (len=50)'
Found leading Byte Order Mark sequence!
after: ' is a string that starts with a Byte Order Mark (len=47)'
Cheers!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With