Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse Apache Log File, Swiftly?

Suppose I have a log file that I've split into an array of strings. For example I have these lines here.

123.4.5.1 - - [03/Sep/2013:18:38:48 -0600] "GET /products/car/ HTTP/1.1" 200 3327 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.65 Safari/537.36"

123.4.5.6 - - [03/Sep/2013:18:38:58 -0600] "GET /jobs/ HTTP/1.1" 500 821 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:23.0) Gecko/20100101 Firefox/23.0"

I can parse these out with typical string manipulation however I think there is a better way to do it with Regex. I attempted to follow a similar pattern that someone had used in python, but I can't quite figure it out. Here's my attempt.

This is the pattern: ([(\d.)]+) - - [(.?)] "(.?)" (\d+) - "(.?)" "(.?)" when I attempt to use it, I get no matches.

let lines = contents.split(separator: "\n")
            let pattern = "([(\\d\\.)]+) - - \\[(.*?)\\] \"(.*?)\" (\\d+) - \"(.*?)\" \"(.*?)\""
            let regex = try! NSRegularExpression(pattern: pattern, options: [])
            for line in lines {
                let range = NSRange(location: 0, length: line.utf16.count)
                let parsedData = regex.firstMatch(in: String(line), options: [], range: range)
                print(parsedData)
            }

If I could extract the data to a model that would be the best. I need to ensure that the code is performant and fast because there could be thousands of lines I should account for.

Expected Result

let someResult = (String, String, String, String, String, String) or 
let someObject: LogFile = LogFile(String, String, String...)

I would be looking for the parsed line to be broken up into it's individual parts. IP, OS, OS Version, Browser Browser Version etc.. any real parsing of the data will be sufficient.

like image 716
xTwisteDx Avatar asked Jan 25 '23 09:01

xTwisteDx


2 Answers

With your shown samples, could you please try following.

^((?:\d+\.){3}\d+).*?\[([^]]*)\].*?"([^"]*)"\s*(\d+)\s*(\d+)\s*"-"\s*"([^"]*)"$

Online demo for above regex

Explanation: Adding detailed explanation for above.

^(                   ##Starting a capturing group checking from starting of value here.
   (?:\d+\.){3}\d+   ##In a non-capturing group matching 3 digits followed by . with 1 or more digits
)                    ##Closing 1st capturing group here.
.*?\[                ##Matching non greedy till [ here.
([^]]*)              ##Creating 2nd capturing group till ] here.
\].*?"               ##Matching ] and non greedy till " here.
([^"]*)              ##Creating 3rd capturing group which has values till " here.
"\s*                 ##Matching " spaces one or more occurrences here.
(\d+)                ##Creating 4th capturing group here which has all digits here.
\s*                  ##Matching spaces one or more occurrences here.
(\d+)                ##Creating 5th capturing group here which has all digits here.
\s*"-"\s*"           ##Spaces 1 or more occurrences "-" followed by spaces  1 or more occurrences " here.
([^"]*)              ##Creating 6th capturing group till " here.
"$                   ##Matching " at last.
like image 71
RavinderSingh13 Avatar answered Jan 29 '23 14:01

RavinderSingh13


The correct regex pattern is the one provided by @RavinderSingh13 however I also want to add what I did to make it function properly within my code so that others can use it in the future without having to search all of StackOverflow for answers.

I needed to find a way to parse an Apache Log File into a usable object within swift. The code is as follows.

Implement Extension

extension String {
    func groups(for regexPattern: String) -> [[String]] {
        do {
            let text = self
            let regex = try NSRegularExpression(pattern: regexPattern)
            let matches = regex.matches(in: text,
                                        range: NSRange(text.startIndex..., in: text))
            return matches.map { match in
                return (0..<match.numberOfRanges).map {
                    let rangeBounds = match.range(at: $0)
                    guard let range = Range(rangeBounds, in: text) else {
                        return ""
                    }
                    return String(text[range])
                }
            }
        } catch let error {
            print("invalid regex: \(error.localizedDescription)")
            return []
        }
    }
}

Create Model Object

class EventLog {
    let ipAddress: String
    let date: String
    let getMethod: String
    let statusCode: String
    let secondStatusCode: String
    let versionInfo: String
    
    init(ipAddress: String, date: String, getMethod: String, statusCode: String, secondStatusCode: String, versionInfo: String ){
        self.ipAddress = ipAddress
        self.date = date
        self.getMethod = getMethod
        self.statusCode = statusCode
        self.secondStatusCode = secondStatusCode
        self.versionInfo = versionInfo
    }
}

Parse The Data

I want to point out that the regex pattern returns an [[String]] so you MUST get the subGroup from the returned overarching group. Similar to parsing JSON.

func parseData() {
        let documentsUrl:URL =  FileManager.default.urls(for: .documentDirectory, in: .userDomainMask).first!
        let destinationFileUrl = documentsUrl.appendingPathComponent("logfile.log")
        
        do {
            let contents = try String(contentsOf: destinationFileUrl, encoding: .utf8)
            let lines = contents.split(separator: "\n")
            let pattern = "^((?:\\d+\\.){3,}\\d).*?\\[([^]]*)\\].*?\"([^\"]*)\"\\s*(\\d+)\\s+(\\d+)\\s*\"-\"\\s*\"([^\"]*)\"$"
            for line in lines {
                let group = String(line).groups(for: pattern)
                let subGroup = group[0]
                let ipAddress = subGroup[1]
                let date = subGroup[2]
                let getMethod = subGroup[3]
                let statusCode = subGroup[4]
                let secondStatusCode = subGroup[5]
                let versionInfo = subGroup[6]
                
                DispatchQueue.main.async {
                    self.eventLogs.append(EventLog(ipAddress: ipAddress, date: date, getMethod: getMethod, statusCode: statusCode, secondStatusCode: secondStatusCode, versionInfo: versionInfo))
                }
            }
        } catch {
            print(error.localizedDescription)
        }
    }
like image 30
xTwisteDx Avatar answered Jan 29 '23 14:01

xTwisteDx