Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read a file, abort with error if it's not valid UTF-8?

Tags:

go

In Go, I want to read in a file line by line, into str's or []rune's.

The file should be encoded in UTF-8, but my program shouldn't trust it. If it contains invalid UTF-8, I want to properly handle the error.

There is bytes.Runes(s []byte) []rune, but that has no error return value. Will it panic on encountering invalid UTF-8?

like image 267
ke. Avatar asked Dec 15 '12 13:12

ke.


1 Answers

For example,

package main

import (
    "bufio"
    "fmt"
    "io/ioutil"
    "os"
    "strings"
    "unicode/utf8"
)

func main() {
    tFile := "text.txt"
    t := []byte{'\xFF', '\n'}
    ioutil.WriteFile(tFile, t, 0666)
    f, err := os.Open(tFile)
    if err != nil {
        fmt.Println(err)
        os.Exit(1)
    }
    defer f.Close()
    r := bufio.NewReader(f)
    s, err := r.ReadString('\n')
    if err != nil {
        fmt.Println(err)
        os.Exit(1)
    }
    s = strings.TrimRight(s, "\n")
    fmt.Println(t, s, []byte(s))
    if !utf8.ValidString(s) {
        fmt.Println("!utf8.ValidString")
    }
}

Output:

[255 10] � [255]
!utf8.ValidString
like image 174
peterSO Avatar answered Oct 05 '22 08:10

peterSO