Reading files with a BOM in Go

Question

I need to read Unicode files that may or may not contain a byte-order mark. I could of course check the first few bytes of the file myself, and discard a BOM if I find one. But before I do, is there any standard way of doing this, either in the core libraries or a third party?

warrens · Accepted Answer

I thought I would add here the way to strip the Byte Order Mark sequence from a string -- rather than messing around with bytes directly (as shown above).

package main

import (
    "fmt"
    "strings"
)

func main() {
    s := "\uFEFF is a string that starts with a Byte Order Mark"
    fmt.Printf("before: '%v' (len=%v)
", s, len(s))

    ByteOrderMarkAsString := string('\uFEFF')

    if strings.HasPrefix(s, ByteOrderMarkAsString) {

        fmt.Printf("Found leading Byte Order Mark sequence!
")
        
        s = strings.TrimPrefix(s, ByteOrderMarkAsString)
    }
    fmt.Printf("after: '%v' (len=%v)
", s, len(s)) 
}

Other "strings" functions should work as well.

And this is what prints out:

before: ' is a string that starts with a Byte Order Mark (len=50)'
Found leading Byte Order Mark sequence!
after: ' is a string that starts with a Byte Order Mark (len=47)'

Cheers!

Reading files with a BOM in Go

Tags:

unicode

go

byte-order-mark

Marcus Downing

1 Answers

warrens

Recent Activity

Donate For Us

Reading files with a BOM in Go

Tags:

unicode

go

byte-order-mark

Marcus Downing

1 Answers

warrens

Related questions

Recent Activity

Donate For Us