Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read a text file line-by-line in Go when some lines are long enough to cause "bufio.Scanner: token too long" errors?

Tags:

I have a text file where each line represents a JSON object. I am processing this file in Go with a simple for loop like this:

scanner := bufio.NewScanner(file)
for scanner.Scan() {
   jsonBytes = scanner.Bytes()
   var jsonObject interface{}
   err := json.Unmarshal(jsonBytes, &jsonObject)

   // do stuff with "jsonObject"...

}
if err := scanner.Err(); err != nil {
   log.Fatal(err)
}

When this code reaches a line with a particularly large JSON string (~67kb), I get the error message, "bufio.Scanner: token too long".

Is there an easy way to increase the max line size readable by NewScanner? Or is there another approach you can take altogether, when needing to read lines that are too large for NewScanner but are known to not be of unsafe size generally?

like image 650
Steve Perkins Avatar asked Jan 14 '14 21:01

Steve Perkins


3 Answers

You can also do:

scanner := bufio.NewScanner(file)
buf := make([]byte, 0, 64*1024)
scanner.Buffer(buf, 1024*1024)
for scanner.Scan() {
    // do your stuff
}

The second argument to scanner.Buffer() sets the maximum token size. In the above example you will be able to scan the file as long as none of the lines is larger than 1MB.

like image 143
lorserker Avatar answered Oct 14 '22 05:10

lorserker


From the package docs:

Programs that need more control over error handling or large tokens, or must run sequential scans on a reader, should use bufio.Reader instead.

It looks like the preferred solution is bufio.Reader.ReadLine.

like image 23
Peter Milley Avatar answered Oct 14 '22 05:10

Peter Milley


You surely don't want to be reading line-by-line in the first place. Why don't you just do this:

d := json.NewDecoder(file)
for {
   var ob whateverType
   err := d.Decode(&ob)
   if err == io.EOF {
       break
   }
   if err != nil {
       log.Fatalf("Error decoding: %v", err)
   }

   // do stuff with "jsonObject"...

}
like image 26
Dustin Avatar answered Oct 14 '22 05:10

Dustin