Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Text parsing of log file in GO

Tags:

go

Go newbie here!

I am trying to put together a Go program that will parse a log file and return specific information on lines matched.

To give an example of what I am trying to achieve I would start with a log file that looks like this:

2019-09-30T04:17:02 - REQUEST-A
2019-09-30T04:18:02 - REQUEST-C
2019-09-30T04:19:02 - REQUEST-B
2019-09-30T04:20:02 - REQUEST-A
2019-09-30T04:21:02 - REQUEST-A
2019-09-30T04:22:02 - REQUEST-B

From here I would want to extract all "REQUEST-A" and either print the time the request occurred to the terminal or to a file.

I have tried using os.Open and scanner and I can use scanner.Text to log that it has found occurrence of my string, like so:

package main

import (
    "bufio"
    "fmt"
    "os"
    "strings"
)

func main() {
    request := 0
    f, err := os.Open("request.log")
    if err != nil {
        fmt.Print("There has been an error!: ", err)
    }
    defer f.Close()
    scanner := bufio.NewScanner(f)

    for scanner.Scan() {
        if strings.Contains(scanner.Text(), "REQUEST-A") {
            request = request + 1
        }

        if err := scanner.Err(); err != nil {
        }
        fmt.Println(request)
    }
}

But I am unsure of how to take this to use it to retrieve the information I am after. Normally I would use Bash for this but I thought I would branch out and see if I could use Go. Any advise would be appreciated.

like image 965
CJW Avatar asked Mar 03 '23 08:03

CJW


2 Answers

In Go, we try to be efficient. Don't do things unneccessarily.

For example,

package main

import (
    "bufio"
    "bytes"
    "fmt"
    "os"
)

func main() {
    lines, requestA := 0, 0
    f, err := os.Open("request.log")
    if err != nil {
        fmt.Print("There has been an error!: ", err)
    }
    defer f.Close()

    scanner := bufio.NewScanner(f)
    for scanner.Scan() {
        lines++
        // filter request a
        line := scanner.Bytes()
        if len(line) <= 30 || line[30] != 'A' {
            continue
        }
        if !bytes.Equal(line[22:], []byte("REQUEST-A")) {
            continue
        }
        requestA++
        request := string(line)

        // handle request a
        fmt.Println(request)
    }
    if err := scanner.Err(); err != nil {
        fmt.Println(err)
    }
    fmt.Println(lines, requestA)
}

Output:

$ go run request.go

2019-09-30T04:17:02 - REQUEST-A
2019-09-30T04:20:02 - REQUEST-A
2019-09-30T04:21:02 - REQUEST-A
6 3

$ cat request.log
2019-09-30T04:17:02 - REQUEST-A
2019-09-30T04:18:02 - REQUEST-C
2019-09-30T04:19:02 - REQUEST-B
2019-09-30T04:20:02 - REQUEST-A
2019-09-30T04:21:02 - REQUEST-A
2019-09-30T04:22:02 - REQUEST-B

To emphasize the importance of efficiency (logs can be very large), let's run a benchmark against Markus W Mahlberg's solution: https://play.golang.org/p/R2D_BeiJvx9.

$ go test log_test.go -bench=. -benchmem
BenchmarkPeterSO-4   21285     56953 ns/op    4128 B/op      2 allocs/op
BenchmarkMarkusM-4     649   1817868 ns/op   84747 B/op   2390 allocs/op

log_test.go:

package main

import (
    "bufio"
    "bytes"
    "regexp"
    "strings"
    "testing"
)

var requestLog = `
2019-09-30T04:17:02 - REQUEST-A
2019-09-30T04:18:02 - REQUEST-C
2019-09-30T04:19:02 - REQUEST-B
2019-09-30T04:20:02 - REQUEST-A
2019-09-30T04:21:02 - REQUEST-A
2019-09-30T04:22:02 - REQUEST-B
`

var benchLog = strings.Repeat(requestLog[1:], 256)

func BenchmarkPeterSO(b *testing.B) {
    for N := 0; N < b.N; N++ {
        scanner := bufio.NewScanner(strings.NewReader(benchLog))
        for scanner.Scan() {
            // filter request a
            line := scanner.Bytes()
            if len(line) <= 30 || line[30] != 'A' {
                continue
            }
            if !bytes.Equal(line[22:], []byte("REQUEST-A")) {
                continue
            }
            request := string(line)
            // handle request a
            _ = request
        }
        if err := scanner.Err(); err != nil {
            b.Fatal(err)
        }
    }
}

func BenchmarkMarkusM(b *testing.B) {
    for N := 0; N < b.N; N++ {
        var re *regexp.Regexp = regexp.MustCompile(`^(\S*) - REQUEST-A$`)
        scanner := bufio.NewScanner(strings.NewReader(benchLog))
        var res []string
        for scanner.Scan() {
            if res = re.FindStringSubmatch(scanner.Text()); len(res) > 0 {
                _ = res[1]
            }
        }
        if err := scanner.Err(); err != nil {
            b.Fatal(err)
        }
    }
}
like image 118
peterSO Avatar answered Mar 20 '23 14:03

peterSO


Use the following code to print the time field for log entries with the value field "REQUEST-A".

for scanner.Scan() {
    line := scanner.Text()
    if len(line) < 19 {
        continue
    }
    if line[19:] == " - REQUEST-A" {
        fmt.Println(line[:19])
    }
}

Run it on the Go play ground!

To write to a file, redirect stdout to a file.

The code above assumes that everything after the timestamp is "- REQUEST-A". Use the following if "- REQUEST-A" is a prefix to other data:

const lenTimestamp = 19
for scanner.Scan() {
    line := scanner.Text()
    if len(line) < lenTimestamp {
        continue
    }
    if strings.HasPrefix(line[lenTimestamp:], " - REQUEST-A") {
        fmt.Println(line[:lenTimestamp])
    }
}

Run this version on the playground.

like image 38
nhoorical Avatar answered Mar 20 '23 15:03

nhoorical