Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression - Repeating Groups

Tags:

.net

regex

I'm trying to read a log file and extract some machine/setting information using regular expressions. Here is a sample from the log:

...
COMPUTER INFO:
 Computer Name:                 TESTCMP02
 Windows User Name:             testUser99
 Time Since Last Reboot:        405 Minutes
 Processor:                     (2 processors) Intel(R) Xeon(R) CPU            5160  @ 3.00GHz
 OS Version:                    5.1 .number 2600:Service Pack 2
 Memory:                        RAM: 48% used, 3069.6 MB total, 1567.3 MB free
 ServerTimeOffSet:              -146 Seconds 
 Use Local Time for Log:        True

INITIAL SETTINGS:
 Command Line:                  /SKIPUPDATES
 Remote Online:                 True
 INI File:                      c:\demoapp\system\DEMOAPP.INI
 DatabaseName:                  testdb
 SQL Server:                    10.254.58.1
 SQL UserName:                  SQLUser
 ODBC Source:                   TestODBC
 Dynamic ODBC (not defined):    True
...

I would like to capture each 'block' of data, using the header as one group, and the data as a second (i.e. "COMPUTER INFO", "Computer Name:.......") and repeat this for each block. The expression if have so far is

(?s)(\p{Lu}{1,} \p{Lu}{1,}:\r\n)(.*\r\n\r\n)

This pulls out the block into the groups like it should, which is great. But I need to have it repeat the capture, which I can't seem to get. I've tried several grouping expressions, including:

(?s)(?:(\p{Lu}{1,} \p{Lu}{1,}:\r\n)(.*\r\n\r\n))*

which would seem to be correct, but I get back lots of NULL result groups with empty group item values. I'm using the .Net RegEx class to apply the expressions, can anyone help me out here?

like image 484
Jason Avatar asked Nov 06 '09 18:11

Jason


People also ask

How do you repeat a group in regex?

For example, you can repeat the contents of a group with a repeating qualifier, such as *, +, ?, or {m,n}. For example, (ab)* will match zero or more repetitions of "ab".

How do you group together in regular expressions?

By placing part of a regular expression inside round brackets or parentheses, you can group that part of the regular expression together. This allows you to apply a quantifier to the entire group or to restrict alternation to part of the regex. Only parentheses can be used for grouping.

What is regex grouping?

What is Group in Regex? A group is a part of a regex pattern enclosed in parentheses () metacharacter. We create a group by placing the regex pattern inside the set of parentheses ( and ) . For example, the regular expression (cat) creates a single group containing the letters 'c', 'a', and 't'.

Can I use named capture groups?

Numbers for Named Capturing Groups. Mixing named and numbered capturing groups is not recommended because flavors are inconsistent in how the groups are numbered. If a group doesn't need to have a name, make it non-capturing using the (?:group) syntax.


2 Answers

It's not possible to have repeated groups. The group will contain the last match.

You'll need to break this into two problems. First, find each section:

new Regex(@"(?>^[A-Z\s]+:\s*$)\s*(?:(?!^\S).)*", RegexOptions.Singleline | RegexOptions.Multiline);

And then, within each match, use another regex to match each field/value into groups:

new Regex(@"^\s+(?<name>[^:]*):\s*(?<value>.*)$", RegexOptions.Multiline);

The code to use this would look something like this:

Regex sectionRegex = new Regex(@"(?>^[A-Z\s]+:\s*$)\s*(?:(?!^\S).)*", RegexOptions.Singleline | RegexOptions.Multiline);
Regex nameValueRegex = new Regex(@"^\s+(?<name>[^:]*):\s*(?<value>.*)$", RegexOptions.Multiline);
MatchCollection sections = sectionRegex.Matches(logData);
foreach (Match section in sections)
{
    MatchCollection nameValues = nameValueRegex.Matches(section.ToString());
    foreach (Match nameValue in nameValues)
    {
        string name = nameValue.Groups["name"].Value;
        string value = nameValue.Groups["value"].Value;
        // OK, do something here.
    }
}
like image 73
Jeremy Stein Avatar answered Sep 17 '22 18:09

Jeremy Stein


((?<header>[^:]+:)(?<content>[^\r\n]+)?\r\n)+

or, if you have empty lines between items:

(((?<header>[^:]+:)(?<content>[^\r\n]+)?\r\n)|\r\n)+
like image 26
Victor Hurdugaci Avatar answered Sep 18 '22 18:09

Victor Hurdugaci