I have some large json files I want to parse, and I want to avoid loading all of the data into memory at once. I'd like a function/loop that can return me each character one at a time.
I found this example for iterating over words in a string, and the ScanRunes function in the bufio package looks like it could return a character at a time. I also had the ReadRune
function from bufio mostly working, but that felt like a pretty heavy approach.
I compared 3 approaches. All used a loop to pull content from either a bufio.Reader or a bufio.Scanner.
.ReadRune
on a bufio.Reader
. Checked for errors from the call to .ReadRune
.bufio.Scanner
after calling .Split(bufio.ScanRunes)
on the scanner. Called .Scan
and .Bytes
on each iteration, checking .Scan
call for errors.bufio.Scanner
instead of bytes using .Text
. Instead of joining a slice of runes with string([]runes)
, I joined an slice of strings with strings.Join([]strings, "")
to form the final blobs of text.The timing for 10 runs of each on a 23 MB json file was:
0.65 s
2.40 s
0.97 s
So it looks like ReadRune
is not too bad after all. It also results in smaller less verbose call because each rune is fetched in 1 operation (.ReadRune
) instead of 2 (.Scan
and .Bytes
).
fgetc() and fputc() in C. fgetc() is used to obtain input from a file single character at a time. This function returns the ASCII code of the character read by the function.
The simplest way of reading a text or binary file in Go is to use the ReadFile() function from the os package. This function reads the entire content of the file into a byte slice, so you should be careful when trying to read a large file - in this case, you should read the file line by line or in chunks.
Read the entire file in GoLang In Go, Reading an entire file content/text is to use ReadFile() function from the ioutil/os package. This function reads the entire content of the file into a byte slice. The ioutil package should not be used in reading a large file thus the function is quite sufficient for small files.
Getting the first character To access the string's first character, we can use the slice expression [] in Go. In the example above, we have passed [0:1] to the slice expression. so it starts the extraction at position 0 and ends at position 1 (which is excluded). Note: The above syntax works on ASCII Characters.
Just read each rune one by one in the loop... See example
package main
import (
"bufio"
"fmt"
"io"
"log"
"strings"
)
var text = `
The quick brown fox jumps over the lazy dog #1.
Быстрая коричневая лиса перепрыгнула через ленивую собаку.
`
func main() {
r := bufio.NewReader(strings.NewReader(text))
for {
if c, sz, err := r.ReadRune(); err != nil {
if err == io.EOF {
break
} else {
log.Fatal(err)
}
} else {
fmt.Printf("%q [%d]\n", string(c), sz)
}
}
}
This code reads runes from the input. No cast is necessary, and it is iterator-like:
package main
import (
"bufio"
"fmt"
"strings"
)
func main() {
in := `{"sample":"json string"}`
s := bufio.NewScanner(strings.NewReader(in))
s.Split(bufio.ScanRunes)
for s.Scan() {
fmt.Println(s.Text())
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With