Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching Multiple Patterns using Java Regex

Tags:

java

regex

I have a file containing records of the following format:

1285957838.880      1 192.168.10.228 TCP_HIT/200 1434 GET http://www.google.com/tools/dlpage/res/c/css/dlpage.css [02/Oct/2010:00:00:38 +0530] je02121 NONE/- text/css

Which has 11 fields ([02/Oct/2010:00:00:38 +0530] is a single field)

I want to write extract fields say 7, 8, 9. Is it possible to extract these fields using Java regex.

Can regex be used to match multiple patterns for the above?

From the above record, I need to extract the fields

f1: http://www.google.com/tools/dlpage/res/c/css/dlpage.css  
f2: 02/Oct/2010:00:00:38 +0530  
f3: je02121
like image 649
Sajja Avatar asked May 11 '11 12:05

Sajja


3 Answers

Do it sequentially, not all in one pattern (if you have many lines like this, split the lines first, also extract the compiled Pattern to a constant):

String input = "1285957838.880      1 192.168.10.228 TCP_HIT/200 1434 GET http://www.google.com/tools/dlpage/res/c/css/dlpage.css [02/Oct/2010:00:00:38 +0530] je02121 NONE/- text/css";
Matcher matcher = Pattern.compile("\\[.*?\\]|\\S+").matcher(input);
int nr = 0;
while (matcher.find()) {
    System.out.println("Match no. " + ++nr + ": '" + matcher.group() + "'");
}

Output:

Match no. 1: '1285957838.880'
Match no. 2: '1'
Match no. 3: '192.168.10.228'
Match no. 4: 'TCP_HIT/200'
Match no. 5: '1434'
Match no. 6: 'GET'
Match no. 7: 'http://www.google.com/tools/dlpage/res/c/css/dlpage.css'
Match no. 8: '[02/Oct/2010:00:00:38 +0530]'
Match no. 9: 'je02121'
Match no. 10: 'NONE/-'
Match no. 11: 'text/css'

Regex Pattern explained:

\\[    match an opening square brace
.*?    and anything up to a
\\]    closing square brace
|      or
\\S+   any sequence of multiple non-whitespace characters
like image 163
Sean Patrick Floyd Avatar answered Oct 06 '22 01:10

Sean Patrick Floyd


Assuming that the only place where spaces are allowed within a field are between the brackets in the date field, and that there are no empty fields, you could use this:

Pattern regex = Pattern.compile(
    "^(?:\\S+\\s+){6}   # first 6 fields\n" +
    "(\\S+)\\s+         # field 7\n" +
    "\\[([^]]+)\\]\\s+  # field 8\n" +
    "(\\S+)             # field 9", 
    Pattern.MULTILINE | Pattern.COMMENTS);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    for (int i = 1; i <= regexMatcher.groupCount(); i++) {
        // matched text: regexMatcher.group(i)
        // match start: regexMatcher.start(i)
        // match end: regexMatcher.end(i)
    }
} 
like image 32
Tim Pietzcker Avatar answered Oct 05 '22 23:10

Tim Pietzcker


use split with regex "[\t\s]+?" and store results in array say s.

Then s[6], s[7]+s[8] and s[9] will be the expected result

like image 22
Amit Gupta Avatar answered Oct 06 '22 00:10

Amit Gupta