I'm trying to read a log file and extract some machine/setting information using regular expressions. Here is a sample from the log:
...
COMPUTER INFO:
Computer Name: TESTCMP02
Windows User Name: testUser99
Time Since Last Reboot: 405 Minutes
Processor: (2 processors) Intel(R) Xeon(R) CPU 5160 @ 3.00GHz
OS Version: 5.1 .number 2600:Service Pack 2
Memory: RAM: 48% used, 3069.6 MB total, 1567.3 MB free
ServerTimeOffSet: -146 Seconds
Use Local Time for Log: True
INITIAL SETTINGS:
Command Line: /SKIPUPDATES
Remote Online: True
INI File: c:\demoapp\system\DEMOAPP.INI
DatabaseName: testdb
SQL Server: 10.254.58.1
SQL UserName: SQLUser
ODBC Source: TestODBC
Dynamic ODBC (not defined): True
...
I would like to capture each 'block' of data, using the header as one group, and the data as a second (i.e. "COMPUTER INFO", "Computer Name:.......") and repeat this for each block. The expression if have so far is
(?s)(\p{Lu}{1,} \p{Lu}{1,}:\r\n)(.*\r\n\r\n)
This pulls out the block into the groups like it should, which is great. But I need to have it repeat the capture, which I can't seem to get. I've tried several grouping expressions, including:
(?s)(?:(\p{Lu}{1,} \p{Lu}{1,}:\r\n)(.*\r\n\r\n))*
which would seem to be correct, but I get back lots of NULL result groups with empty group item values. I'm using the .Net RegEx class to apply the expressions, can anyone help me out here?
For example, you can repeat the contents of a group with a repeating qualifier, such as *, +, ?, or {m,n}. For example, (ab)* will match zero or more repetitions of "ab".
By placing part of a regular expression inside round brackets or parentheses, you can group that part of the regular expression together. This allows you to apply a quantifier to the entire group or to restrict alternation to part of the regex. Only parentheses can be used for grouping.
What is Group in Regex? A group is a part of a regex pattern enclosed in parentheses () metacharacter. We create a group by placing the regex pattern inside the set of parentheses ( and ) . For example, the regular expression (cat) creates a single group containing the letters 'c', 'a', and 't'.
Numbers for Named Capturing Groups. Mixing named and numbered capturing groups is not recommended because flavors are inconsistent in how the groups are numbered. If a group doesn't need to have a name, make it non-capturing using the (?:group) syntax.
It's not possible to have repeated groups. The group will contain the last match.
You'll need to break this into two problems. First, find each section:
new Regex(@"(?>^[A-Z\s]+:\s*$)\s*(?:(?!^\S).)*", RegexOptions.Singleline | RegexOptions.Multiline);
And then, within each match, use another regex to match each field/value into groups:
new Regex(@"^\s+(?<name>[^:]*):\s*(?<value>.*)$", RegexOptions.Multiline);
The code to use this would look something like this:
Regex sectionRegex = new Regex(@"(?>^[A-Z\s]+:\s*$)\s*(?:(?!^\S).)*", RegexOptions.Singleline | RegexOptions.Multiline);
Regex nameValueRegex = new Regex(@"^\s+(?<name>[^:]*):\s*(?<value>.*)$", RegexOptions.Multiline);
MatchCollection sections = sectionRegex.Matches(logData);
foreach (Match section in sections)
{
MatchCollection nameValues = nameValueRegex.Matches(section.ToString());
foreach (Match nameValue in nameValues)
{
string name = nameValue.Groups["name"].Value;
string value = nameValue.Groups["value"].Value;
// OK, do something here.
}
}
((?<header>[^:]+:)(?<content>[^\r\n]+)?\r\n)+
or, if you have empty lines between items:
(((?<header>[^:]+:)(?<content>[^\r\n]+)?\r\n)|\r\n)+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With