Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

bufio.Reader and bufio.Scanner functionality and performance

Tags:

go

I had seen several blurbs on the interwebs which had loosely talked about why one should use bufio.Scanner instead of bufio.Reader.

I don't know if my test case is relevant, but I decided to test one vs the other when it comes to reading 1,000,000 lines from a text file:

package main

import (
    "fmt"
    "strconv"
    "bufio"
    "time"
    "os"
    //"bytes"
)

func main() {

    fileName := "testfile.txt"

    // Create 1,000,000 integers as strings
    numItems := 1000000
    startInitStringArray := time.Now()

    var input [1000000]string
    //var input []string

    for i:=0; i < numItems; i++ {
        input[i] = strconv.Itoa(i)
        //input = append(input,strconv.Itoa(i))
    }

    elapsedInitStringArray := time.Since(startInitStringArray)
    fmt.Printf("Took %s to populate string array.\n", elapsedInitStringArray)

    // Write to a file
    fo, _ := os.Create(fileName)
    for i:=0; i < numItems; i++ {
        fo.WriteString(input[i] + "\n")
    }

    fo.Close()

    // Use reader
    openedFile, _ := os.Open(fileName)

    startReader := time.Now()
    reader := bufio.NewReader(openedFile)

    for i:=0; i < numItems; i++ {
        reader.ReadLine()
    }
    elapsedReader := time.Since(startReader)
    fmt.Printf("Took %s to read file using reader.\n", elapsedReader)
    openedFile.Close()

    // Use scanner
    openedFile, _ = os.Open(fileName)

    startScanner := time.Now()
    scanner := bufio.NewScanner(openedFile)

    for i:=0; i < numItems; i++ {
        scanner.Scan()
        scanner.Text()
    }

    elapsedScanner := time.Since(startScanner)
    fmt.Printf("Took %s to read file using scanner.\n", elapsedScanner)
    openedFile.Close()
}

A pretty average output I receive on the timings looks like this:

Took 44.1165ms to populate string array.
Took 17.0465ms to read file using reader.
Took 23.0613ms to read file using scanner.

I am curious, when is it better to use a reader vs. a scanner, and is it based on performance, or functionality?

like image 376
blgrnboy Avatar asked Nov 22 '17 19:11

blgrnboy


1 Answers

It's a flawed benchmark. They are not doing the same thing.

func (b *Reader) ReadLine() (line []byte, isPrefix bool, err error)

returns []byte.

func (s *Scanner) Text() string

returns string([]byte)

To be comparable, use,

func (s *Scanner) Bytes() []byte

It's a flawed benchmark. It reads short strings, the integers from "0\n" to "999999\n". What real-world data set looks like that?

In the real world we read Shakespeare: http://www.gutenberg.org/ebooks/100: Plain Text UTF-8: pg100.txt.

Took 2.973307ms to read file using reader.   size: 5340315 lines: 124787
Took 2.940388ms to read file using scanner.  size: 5340315 lines: 124787
like image 84
peterSO Avatar answered Oct 19 '22 03:10

peterSO