Suppose I have a log file that I've split into an array of strings. For example I have these lines here.
123.4.5.1 - - [03/Sep/2013:18:38:48 -0600] "GET /products/car/ HTTP/1.1" 200 3327 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.65 Safari/537.36"
123.4.5.6 - - [03/Sep/2013:18:38:58 -0600] "GET /jobs/ HTTP/1.1" 500 821 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:23.0) Gecko/20100101 Firefox/23.0"
I can parse these out with typical string manipulation however I think there is a better way to do it with Regex. I attempted to follow a similar pattern that someone had used in python, but I can't quite figure it out. Here's my attempt.
This is the pattern: ([(\d.)]+) - - [(.?)] "(.?)" (\d+) - "(.?)" "(.?)" when I attempt to use it, I get no matches.
let lines = contents.split(separator: "\n")
let pattern = "([(\\d\\.)]+) - - \\[(.*?)\\] \"(.*?)\" (\\d+) - \"(.*?)\" \"(.*?)\""
let regex = try! NSRegularExpression(pattern: pattern, options: [])
for line in lines {
let range = NSRange(location: 0, length: line.utf16.count)
let parsedData = regex.firstMatch(in: String(line), options: [], range: range)
print(parsedData)
}
If I could extract the data to a model that would be the best. I need to ensure that the code is performant and fast because there could be thousands of lines I should account for.
let someResult = (String, String, String, String, String, String) or
let someObject: LogFile = LogFile(String, String, String...)
I would be looking for the parsed line to be broken up into it's individual parts. IP
, OS
, OS Version
, Browser
Browser Version
etc.. any real parsing of the data will be sufficient.
With your shown samples, could you please try following.
^((?:\d+\.){3}\d+).*?\[([^]]*)\].*?"([^"]*)"\s*(\d+)\s*(\d+)\s*"-"\s*"([^"]*)"$
Online demo for above regex
Explanation: Adding detailed explanation for above.
^( ##Starting a capturing group checking from starting of value here.
(?:\d+\.){3}\d+ ##In a non-capturing group matching 3 digits followed by . with 1 or more digits
) ##Closing 1st capturing group here.
.*?\[ ##Matching non greedy till [ here.
([^]]*) ##Creating 2nd capturing group till ] here.
\].*?" ##Matching ] and non greedy till " here.
([^"]*) ##Creating 3rd capturing group which has values till " here.
"\s* ##Matching " spaces one or more occurrences here.
(\d+) ##Creating 4th capturing group here which has all digits here.
\s* ##Matching spaces one or more occurrences here.
(\d+) ##Creating 5th capturing group here which has all digits here.
\s*"-"\s*" ##Spaces 1 or more occurrences "-" followed by spaces 1 or more occurrences " here.
([^"]*) ##Creating 6th capturing group till " here.
"$ ##Matching " at last.
The correct regex pattern is the one provided by @RavinderSingh13 however I also want to add what I did to make it function properly within my code so that others can use it in the future without having to search all of StackOverflow for answers.
I needed to find a way to parse an Apache Log File into a usable object within swift. The code is as follows.
extension String {
func groups(for regexPattern: String) -> [[String]] {
do {
let text = self
let regex = try NSRegularExpression(pattern: regexPattern)
let matches = regex.matches(in: text,
range: NSRange(text.startIndex..., in: text))
return matches.map { match in
return (0..<match.numberOfRanges).map {
let rangeBounds = match.range(at: $0)
guard let range = Range(rangeBounds, in: text) else {
return ""
}
return String(text[range])
}
}
} catch let error {
print("invalid regex: \(error.localizedDescription)")
return []
}
}
}
class EventLog {
let ipAddress: String
let date: String
let getMethod: String
let statusCode: String
let secondStatusCode: String
let versionInfo: String
init(ipAddress: String, date: String, getMethod: String, statusCode: String, secondStatusCode: String, versionInfo: String ){
self.ipAddress = ipAddress
self.date = date
self.getMethod = getMethod
self.statusCode = statusCode
self.secondStatusCode = secondStatusCode
self.versionInfo = versionInfo
}
}
I want to point out that the regex pattern returns an [[String]] so you MUST get the subGroup from the returned overarching group. Similar to parsing JSON.
func parseData() {
let documentsUrl:URL = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask).first!
let destinationFileUrl = documentsUrl.appendingPathComponent("logfile.log")
do {
let contents = try String(contentsOf: destinationFileUrl, encoding: .utf8)
let lines = contents.split(separator: "\n")
let pattern = "^((?:\\d+\\.){3,}\\d).*?\\[([^]]*)\\].*?\"([^\"]*)\"\\s*(\\d+)\\s+(\\d+)\\s*\"-\"\\s*\"([^\"]*)\"$"
for line in lines {
let group = String(line).groups(for: pattern)
let subGroup = group[0]
let ipAddress = subGroup[1]
let date = subGroup[2]
let getMethod = subGroup[3]
let statusCode = subGroup[4]
let secondStatusCode = subGroup[5]
let versionInfo = subGroup[6]
DispatchQueue.main.async {
self.eventLogs.append(EventLog(ipAddress: ipAddress, date: date, getMethod: getMethod, statusCode: statusCode, secondStatusCode: secondStatusCode, versionInfo: versionInfo))
}
}
} catch {
print(error.localizedDescription)
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With