Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read log file from the end and get the offset of a particular string

Tags:

file

io

go

.e.g. 1. logfile

  • Start
  • Line1
  • Line2
  • Line3
  • End

I am able to get the seek position of Line1 when I read the file from beginning.

func getSeekLocation() int64 {
    start := int64(0)
    input, err := os.Open(logFile)
    if err != nil {
        fmt.Println(err)
    }
    if _, err := input.Seek(start, io.SeekStart); err != nil {
        fmt.Println(err)
    }
    scanner := bufio.NewScanner(input)

    pos := start
    scanLines := func(data []byte, atEOF bool) (advance int, token []byte, 
    err error) {
        advance, token, err = bufio.ScanLines(data, atEOF)
        pos += int64(advance)
        return
    }
    scanner.Split(scanLines)
    for scanner.Scan() {
       if strings.Contains(scanner.Text(), "Line1") {
        break
       }
    }
    size, err := getFileSize()
    if err != nil {
        fmt.Println(err)
    }
    return size - pos
}

But this is not an efficient way to solve the problem because as the file size increases the time to get the location will also increase. I would like to get the location of the line from the EOF location which I think would be more efficient.

like image 420
sks Avatar asked Dec 18 '22 03:12

sks


1 Answers

Note: I optimized and improved the below solution, and released it as a library here: github.com/icza/backscanner


bufio.Scanner uses an io.Reader as its source, which does not support seeking and / or reading from arbitrary positions, so it is not capable of scanning lines from the end. bufio.Scanner can only read any part of the input once all data preceding it has already been read (that is, it can only read the end of the file if it reads all the file's content first).

So we need a custom solution to implement such functionality. Fortunately os.File does support reading from arbitrary positions as it implements both io.Seeker and io.ReaderAt (any of them would be sufficient to do what we need).

Scanner that returns lines going backward, starting at the end

Let's construct a Scanner which scans lines backward, starting with the last line. For this, we'll utilize an io.ReaderAt. The following implementation uses an internal buffer into which data is read by chunks, starting from the end of the input. The size of the input must also be passed (which is basically the position where we want to start reading from, which must not necessarily be the end position).

type Scanner struct {
    r   io.ReaderAt
    pos int
    err error
    buf []byte
}

func NewScanner(r io.ReaderAt, pos int) *Scanner {
    return &Scanner{r: r, pos: pos}
}

func (s *Scanner) readMore() {
    if s.pos == 0 {
        s.err = io.EOF
        return
    }
    size := 1024
    if size > s.pos {
        size = s.pos
    }
    s.pos -= size
    buf2 := make([]byte, size, size+len(s.buf))

    // ReadAt attempts to read full buff!
    _, s.err = s.r.ReadAt(buf2, int64(s.pos))
    if s.err == nil {
        s.buf = append(buf2, s.buf...)
    }
}

func (s *Scanner) Line() (line string, start int, err error) {
    if s.err != nil {
        return "", 0, s.err
    }
    for {
        lineStart := bytes.LastIndexByte(s.buf, '\n')
        if lineStart >= 0 {
            // We have a complete line:
            var line string
            line, s.buf = string(dropCR(s.buf[lineStart+1:])), s.buf[:lineStart]
            return line, s.pos + lineStart + 1, nil
        }
        // Need more data:
        s.readMore()
        if s.err != nil {
            if s.err == io.EOF {
                if len(s.buf) > 0 {
                    return string(dropCR(s.buf)), 0, nil
                }
            }
            return "", 0, s.err
        }
    }
}

// dropCR drops a terminal \r from the data.
func dropCR(data []byte) []byte {
    if len(data) > 0 && data[len(data)-1] == '\r' {
        return data[0 : len(data)-1]
    }
    return data
}

Example using it:

func main() {
    scanner := NewScanner(strings.NewReader(src), len(src))
    for {
        line, pos, err := scanner.Line()
        if err != nil {
            fmt.Println("Error:", err)
            break
        }
        fmt.Printf("Line start: %2d, line: %s\n", pos, line)
    }
}

const src = `Start
Line1
Line2
Line3
End`

Output (try it on the Go Playground):

Line start: 24, line: End
Line start: 18, line: Line3
Line start: 12, line: Line2
Line start:  6, line: Line1
Line start:  0, line: Start
Error: EOF

Notes:

  • The above Scanner does not limit max length of lines, it handles all.
  • The above Scanner handles both \n and \r\n line endings (ensured by the dropCR() function).
  • You may pass any starter position not just the size / length, and listing lines will be performed from there (continuation).
  • The above Scanner does not reuse buffers, always creates new ones when needed. It would be enough to (pre)allocate 2 buffers, and use those wisely. Implementation would become more complex, and it would introduce a max line length limit.

Using it with a file

To use this Scanner with a file, you may use os.Open() to open a file. Note that *File implements io.ReaderAt(). Then you may use File.Stat() to obtain info about the file (os.FileInfo), including its size (length):

f, err := os.Open("a.txt")
if err != nil {
    panic(err)
}
fi, err := f.Stat()
if err != nil {
    panic(err)
}
defer f.Close()

scanner := NewScanner(f, int(fi.Size()))

Looking for a substring in a line

If you're looking for a substring in a line, then simply use the above Scanner which returns the starting pos of each line, reading lines from the end.

You may check the substring in each line using strings.Index(), which returns the substring position inside the line, and if found, add the line start position to this.

Let's say we're looking for the "ine2" substring (which is part of the "Line2" line). Here's how you can do that:

scanner := NewScanner(strings.NewReader(src), len(src))
what := "ine2"
for {
    line, pos, err := scanner.Line()
    if err != nil {
        fmt.Println("Error:", err)
        break
    }
    fmt.Printf("Line start: %2d, line: %s\n", pos, line)

    if i := strings.Index(line, what); i >= 0 {
        fmt.Printf("Found %q at line position: %d, global position: %d\n",
            what, i, pos+i)
        break
    }
}

Output (try it on the Go Playground):

Line start: 24, line: End
Line start: 18, line: Line3
Line start: 12, line: Line2
Found "ine2" at line position: 1, global position: 13
like image 140
icza Avatar answered Dec 25 '22 22:12

icza