I had seen several blurbs on the interwebs which had loosely talked about why one should use bufio.Scanner instead of bufio.Reader.
I don't know if my test case is relevant, but I decided to test one vs the other when it comes to reading 1,000,000 lines from a text file:
package main
import (
"fmt"
"strconv"
"bufio"
"time"
"os"
//"bytes"
)
func main() {
fileName := "testfile.txt"
// Create 1,000,000 integers as strings
numItems := 1000000
startInitStringArray := time.Now()
var input [1000000]string
//var input []string
for i:=0; i < numItems; i++ {
input[i] = strconv.Itoa(i)
//input = append(input,strconv.Itoa(i))
}
elapsedInitStringArray := time.Since(startInitStringArray)
fmt.Printf("Took %s to populate string array.\n", elapsedInitStringArray)
// Write to a file
fo, _ := os.Create(fileName)
for i:=0; i < numItems; i++ {
fo.WriteString(input[i] + "\n")
}
fo.Close()
// Use reader
openedFile, _ := os.Open(fileName)
startReader := time.Now()
reader := bufio.NewReader(openedFile)
for i:=0; i < numItems; i++ {
reader.ReadLine()
}
elapsedReader := time.Since(startReader)
fmt.Printf("Took %s to read file using reader.\n", elapsedReader)
openedFile.Close()
// Use scanner
openedFile, _ = os.Open(fileName)
startScanner := time.Now()
scanner := bufio.NewScanner(openedFile)
for i:=0; i < numItems; i++ {
scanner.Scan()
scanner.Text()
}
elapsedScanner := time.Since(startScanner)
fmt.Printf("Took %s to read file using scanner.\n", elapsedScanner)
openedFile.Close()
}
A pretty average output I receive on the timings looks like this:
Took 44.1165ms to populate string array.
Took 17.0465ms to read file using reader.
Took 23.0613ms to read file using scanner.
I am curious, when is it better to use a reader vs. a scanner, and is it based on performance, or functionality?
It's a flawed benchmark. They are not doing the same thing.
func (b *Reader) ReadLine() (line []byte, isPrefix bool, err error)
returns []byte
.
func (s *Scanner) Text() string
returns string([]byte)
To be comparable, use,
func (s *Scanner) Bytes() []byte
It's a flawed benchmark. It reads short strings, the integers from "0\n
" to "999999\n
". What real-world data set looks like that?
In the real world we read Shakespeare: http://www.gutenberg.org/ebooks/100: Plain Text UTF-8: pg100.txt
.
Took 2.973307ms to read file using reader. size: 5340315 lines: 124787
Took 2.940388ms to read file using scanner. size: 5340315 lines: 124787
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With