Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read a file character by character in Go

Tags:

json

io

parsing

go

I have some large json files I want to parse, and I want to avoid loading all of the data into memory at once. I'd like a function/loop that can return me each character one at a time.

I found this example for iterating over words in a string, and the ScanRunes function in the bufio package looks like it could return a character at a time. I also had the ReadRune function from bufio mostly working, but that felt like a pretty heavy approach.

EDIT

I compared 3 approaches. All used a loop to pull content from either a bufio.Reader or a bufio.Scanner.

  1. Read runes in a loop using .ReadRune on a bufio.Reader. Checked for errors from the call to .ReadRune.
  2. Read bytes from a bufio.Scanner after calling .Split(bufio.ScanRunes) on the scanner. Called .Scan and .Bytes on each iteration, checking .Scan call for errors.
  3. Same as #2 but read text from a bufio.Scanner instead of bytes using .Text. Instead of joining a slice of runes with string([]runes), I joined an slice of strings with strings.Join([]strings, "") to form the final blobs of text.

The timing for 10 runs of each on a 23 MB json file was:

  1. 0.65 s
  2. 2.40 s
  3. 0.97 s

So it looks like ReadRune is not too bad after all. It also results in smaller less verbose call because each rune is fetched in 1 operation (.ReadRune) instead of 2 (.Scan and .Bytes).

like image 245
turtlemonvh Avatar asked Aug 06 '15 14:08

turtlemonvh


People also ask

How do I read a char file?

fgetc() and fputc() in C. fgetc() is used to obtain input from a file single character at a time. This function returns the ASCII code of the character read by the function.

How do I read the contents of a file in Go?

The simplest way of reading a text or binary file in Go is to use the ReadFile() function from the os package. This function reads the entire content of the file into a byte slice, so you should be careful when trying to read a large file - in this case, you should read the file line by line or in chunks.

How do I read a chunk in Golang?

Read the entire file in GoLang In Go, Reading an entire file content/text is to use ReadFile() function from the ioutil/os package. This function reads the entire content of the file into a byte slice. The ioutil package should not be used in reading a large file thus the function is quite sufficient for small files.

How do you get a character from a string in Go?

Getting the first character To access the string's first character, we can use the slice expression [] in Go. In the example above, we have passed [0:1] to the slice expression. so it starts the extraction at position 0 and ends at position 1 (which is excluded). Note: The above syntax works on ASCII Characters.


2 Answers

Just read each rune one by one in the loop... See example

package main

import (
    "bufio"
    "fmt"
    "io"
    "log"
    "strings"
)

var text = `
The quick brown fox jumps over the lazy dog #1.
Быстрая коричневая лиса перепрыгнула через ленивую собаку.
`

func main() {
    r := bufio.NewReader(strings.NewReader(text))
    for {
        if c, sz, err := r.ReadRune(); err != nil {
            if err == io.EOF {
                break
            } else {
                log.Fatal(err)
            }
        } else {
            fmt.Printf("%q [%d]\n", string(c), sz)
        }
    }
}
like image 52
tez Avatar answered Sep 23 '22 05:09

tez


This code reads runes from the input. No cast is necessary, and it is iterator-like:

package main

import (
    "bufio"
    "fmt"
    "strings"
)

func main() {
    in := `{"sample":"json string"}`

    s := bufio.NewScanner(strings.NewReader(in))
    s.Split(bufio.ScanRunes)

    for s.Scan() {
        fmt.Println(s.Text())
    }
}
like image 30
Alex Netkachov Avatar answered Sep 26 '22 05:09

Alex Netkachov