Suppose I have a log file that I've split into an array of strings. For example I have these lines here.
123.4.5.1 - - [03/Sep/2013:18:38:48 -0600] "GET /products/car/ HTTP/1.1" 200 3327 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.65 Safari/537.36"
123.4.5.6 - - [03/Sep/2013:18:38:58 -0600] "GET /jobs/ HTTP/1.1" 500 821 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:23.0) Gecko/20100101 Firefox/23.0"
I can parse these out with typical string manipulation however I think there is a better way to do it with Regex. I attempted to follow a similar pattern that someone had used in python, but I can't quite figure it out. Here's my attempt.
This is the pattern: ([(\d.)]+) - - [(.?)] "(.?)" (\d+) - "(.?)" "(.?)" when I attempt to use it, I get no matches.
let lines = contents.split(separator: "\n")
let pattern = "([(\\d\\.)]+) - - \\[(.*?)\\] \"(.*?)\" (\\d+) - \"(.*?)\" \"(.*?)\""
let regex = try! NSRegularExpression(pattern: pattern, options: [])
for line in lines {
let range = NSRange(location: 0, length: line.utf16.count)
let parsedData = regex.firstMatch(in: String(line), options: [], range: range)
print(parsedData)
}
If I could extract the data to a model that would be the best. I need to ensure that the code is performant and fast because there could be thousands of lines I should account for.
let someResult = (String, String, String, String, String, String) or
let someObject: LogFile = LogFile(String, String, String...)
I would be looking for the parsed line to be broken up into it's individual parts. IP, OS, OS Version, Browser Browser Version etc.. any real parsing of the data will be sufficient.
With your shown samples, could you please try following.
^((?:\d+\.){3}\d+).*?\[([^]]*)\].*?"([^"]*)"\s*(\d+)\s*(\d+)\s*"-"\s*"([^"]*)"$
Online demo for above regex
Explanation: Adding detailed explanation for above.
^( ##Starting a capturing group checking from starting of value here.
(?:\d+\.){3}\d+ ##In a non-capturing group matching 3 digits followed by . with 1 or more digits
) ##Closing 1st capturing group here.
.*?\[ ##Matching non greedy till [ here.
([^]]*) ##Creating 2nd capturing group till ] here.
\].*?" ##Matching ] and non greedy till " here.
([^"]*) ##Creating 3rd capturing group which has values till " here.
"\s* ##Matching " spaces one or more occurrences here.
(\d+) ##Creating 4th capturing group here which has all digits here.
\s* ##Matching spaces one or more occurrences here.
(\d+) ##Creating 5th capturing group here which has all digits here.
\s*"-"\s*" ##Spaces 1 or more occurrences "-" followed by spaces 1 or more occurrences " here.
([^"]*) ##Creating 6th capturing group till " here.
"$ ##Matching " at last.
The correct regex pattern is the one provided by @RavinderSingh13 however I also want to add what I did to make it function properly within my code so that others can use it in the future without having to search all of StackOverflow for answers.
I needed to find a way to parse an Apache Log File into a usable object within swift. The code is as follows.
extension String {
func groups(for regexPattern: String) -> [[String]] {
do {
let text = self
let regex = try NSRegularExpression(pattern: regexPattern)
let matches = regex.matches(in: text,
range: NSRange(text.startIndex..., in: text))
return matches.map { match in
return (0..<match.numberOfRanges).map {
let rangeBounds = match.range(at: $0)
guard let range = Range(rangeBounds, in: text) else {
return ""
}
return String(text[range])
}
}
} catch let error {
print("invalid regex: \(error.localizedDescription)")
return []
}
}
}
class EventLog {
let ipAddress: String
let date: String
let getMethod: String
let statusCode: String
let secondStatusCode: String
let versionInfo: String
init(ipAddress: String, date: String, getMethod: String, statusCode: String, secondStatusCode: String, versionInfo: String ){
self.ipAddress = ipAddress
self.date = date
self.getMethod = getMethod
self.statusCode = statusCode
self.secondStatusCode = secondStatusCode
self.versionInfo = versionInfo
}
}
I want to point out that the regex pattern returns an [[String]] so you MUST get the subGroup from the returned overarching group. Similar to parsing JSON.
func parseData() {
let documentsUrl:URL = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask).first!
let destinationFileUrl = documentsUrl.appendingPathComponent("logfile.log")
do {
let contents = try String(contentsOf: destinationFileUrl, encoding: .utf8)
let lines = contents.split(separator: "\n")
let pattern = "^((?:\\d+\\.){3,}\\d).*?\\[([^]]*)\\].*?\"([^\"]*)\"\\s*(\\d+)\\s+(\\d+)\\s*\"-\"\\s*\"([^\"]*)\"$"
for line in lines {
let group = String(line).groups(for: pattern)
let subGroup = group[0]
let ipAddress = subGroup[1]
let date = subGroup[2]
let getMethod = subGroup[3]
let statusCode = subGroup[4]
let secondStatusCode = subGroup[5]
let versionInfo = subGroup[6]
DispatchQueue.main.async {
self.eventLogs.append(EventLog(ipAddress: ipAddress, date: date, getMethod: getMethod, statusCode: statusCode, secondStatusCode: secondStatusCode, versionInfo: versionInfo))
}
}
} catch {
print(error.localizedDescription)
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With