I am working on a small script which uses bufio.Scanner
and http.Request
as well as go routines to count words and lines in parallel.
package main
import (
"bufio"
"fmt"
"io"
"log"
"net/http"
"time"
)
func main() {
err := request("http://www.google.com")
if err != nil {
log.Fatal(err)
}
// just keep main alive with sleep for now
time.Sleep(2 * time.Second)
}
func request(url string) error {
res, err := http.Get(url)
if err != nil {
return err
}
go scanLineWise(res.Body)
go scanWordWise(res.Body)
return err
}
func scanLineWise(r io.Reader) {
s := bufio.NewScanner(r)
s.Split(bufio.ScanLines)
i := 0
for s.Scan() {
i++
}
fmt.Printf("Counted %d lines.\n", i)
}
func scanWordWise(r io.Reader) {
s := bufio.NewScanner(r)
s.Split(bufio.ScanWords)
i := 0
for s.Scan() {
i++
}
fmt.Printf("Counted %d words.\n", i)
}
Source
As more or less expected from streams scanLineWise
will count a number while scalWordWise
will count zero. This is because scanLineWise
already reads everything from req.Body
.
I would know like to know: How to solve this elegantly?
My first thought was to build a struct which implements io.Reader
and io.Writer
. We could use io.Copy
to read from req.Body
and write it to the writer
. When the scanners read from this writer then writer will copy the data instead of reading it. Unfortunately this will just collect memory over time and break the whole idea of streams...
The options are pretty straightforward -- you either maintain the "stream" of data, or you buffer the body.
If you really do need to read over the body more then once sequentially, you need to buffer it somewhere. There's no way around that.
There's a number of way you could stream the data, like having the line counter output lines into the word counter (preferably through channels). You could also build a pipeline using io.TeeReader
and io.Pipe
, and supply a unique reader for each function.
...
pipeReader, pipeWriter := io.Pipe()
bodyReader := io.TeeReader(res.Body, pipeWriter)
go scanLineWise(bodyReader)
go scanWordWise(pipeReader)
...
That can get unwieldy with more consumers though, so you could use io.MultiWriter
to multiplex to more io.Readers
.
...
pipeOneR, pipeOneW := io.Pipe()
pipeTwoR, pipeTwoW := io.Pipe()
pipeThreeR, pipeThreeW := io.Pipe()
go scanLineWise(pipeOneR)
go scanWordWise(pipeTwoR)
go scanSomething(pipeThreeR)
// of course, this should probably have some error handling
io.Copy(io.MultiWriter(pipeOneW, pipeTwoW, pipeThreeW), res.Body)
...
You could use channels, do the actual reading in your scanLineWise
then pass the lines to scanWordWise
, for example:
func countLines(r io.Reader) (ch chan string) {
ch = make(chan string)
go func() {
s := bufio.NewScanner(r)
s.Split(bufio.ScanLines)
cnt := 0
for s.Scan() {
ch <- s.Text()
cnt++
}
close(ch)
fmt.Printf("Counted %d lines.\n", cnt)
}()
return
}
func countWords(ch <-chan string) {
cnt := 0
for line := range ch {
s := bufio.NewScanner(strings.NewReader(line))
s.Split(bufio.ScanWords)
for s.Scan() {
cnt++
}
}
fmt.Printf("Counted %d words.\n", cnt)
}
func main() {
r := strings.NewReader(body)
ch := countLines(r)
go countWords(ch)
time.Sleep(1 * time.Second)
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With