How can I read lines from a file where the line endings are carriage return (CR), newline (NL), or both?
The PDF specification allows lines to end with CR, LF, or CRLF.
bufio.Reader.ReadString()
and bufio.Reader.ReadBytes()
allow a single delimiter byte.
bufio.Scanner.Scan()
handles \n
optionally preceded by \r
, but not a lone \r
.
The end-of-line marker is one optional carriage return followed by one mandatory newline.
Do I need to write my own function that uses bufio.Reader.ReadByte()
?
use a text editor like notepad++ that can help you with understanding the line ends. It will show you the line end formats used as either Unix(LF) or Macintosh(CR) or Windows(CR LF) on the task bar of the tool. you can also go to View->Show Symbol->Show End Of Line to display the line ends as LF/ CR LF/CR.
Open any text file and click on the pilcrow (¶) button. Notepad++ will show all of the characters with newline characters in either the CR and LF format. If it is a Windows EOL encoded file, the newline characters of CR LF will appear (\r\n). If the file is UNIX or Mac EOL encoded, then it will only show LF (\n).
You can use vim -b filename to edit a file in binary mode, which will show ^M characters for carriage return and a new line is indicative of LF being present, indicating Windows CRLF line endings. By LF I mean \n and by CR I mean \r .
The carriage return, also referred to as Ctrl+M, character would show up as an octal 15 if you were looking at the file with an od octal dump) command. The characters CRLF are often used to represent the carriage return and linefeed sequence that ends lines on Windows text files.
You can write custom bufio.SplitFunc
for bufio.Scanner
. E.g:
// Mostly bufio.ScanLines code:
func ScanPDFLines(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF && len(data) == 0 {
return 0, nil, nil
}
if i := bytes.IndexAny(data, "\r\n"); i >= 0 {
if data[i] == '\n' {
// We have a line terminated by single newline.
return i + 1, data[0:i], nil
}
advance = i + 1
if len(data) > i+1 && data[i+1] == '\n' {
advance += 1
}
return advance, data[0:i], nil
}
// If we're at EOF, we have a final, non-terminated line. Return it.
if atEOF {
return len(data), data, nil
}
// Request more data.
return 0, nil, nil
}
And use it like:
scan := bufio.NewScanner(r)
scan.Split(ScanPDFLines)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With