Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java - Pattern matches but fails to capture

Tags:

java

regex

Here are three sample lines from my dataset:

|   |   |   |   featureB >= 16104.33 : 18873.52 (1/0)

|   featureA >= 17980.32

featureC = ABC BLAH BLAH blA'H $blah 4/ blah blah

I am trying to come up with a pattern matcher which would capture the following:

  • feature name
  • the relation (=, >=, <)
  • feature value (could be a mix of numbers and/or characters, but never contains a colon)
  • result ( the value that comes after the colon and before the bracket - the colon and the result are optional and may not appear on some lines)

I came up with the following pattern, but it fails to capture the feature value:

Pattern.compile("(?:\\|   )*(.*?)(>?=|<)((?!:).)*(?::?)(.*?)(?:\\(.*\\))?")

So basically my aim is for group(1) to contain the feature name, group(2) to contain the relation, group(3) to contain the feature value, and group(4) to contain the result if it exists.

Currently group(1), group(2), and group(4) produce what I'm expecting but group(3) is never captured and is always empty.

I would appreciate any help/advice.

like image 405
Radix Avatar asked Feb 10 '16 15:02

Radix


3 Answers

Based on your well drafted requirements I came up with this regex to capture all 4 groups (4th being the optional one):

^[ |]*(\w+)\s*(>?=|<)\s*([^:]+)(?:\s*:\s*([^()]*))?

Java pattern:

Pattern p = Patttern.compile("^[ |]*(\\w+)\\s*(>?=|<)\\s*([^:]+)(?:\\s*:\\s*([^(]+))?.*$");

RegEx Demo

like image 138
anubhava Avatar answered Nov 18 '22 16:11

anubhava


In group 5 is the optional bracket content.

^[ |]*(\w+)\s*(>?=|<)\s*([^:]+?)(?:\s*:\s*([^\(]+))?(\(.*)?$

See example @ https://regex101.com/r/bP6xJ4/1

like image 35
Thorben Stangenberg Avatar answered Nov 18 '22 17:11

Thorben Stangenberg


This appears to work for all of your inputs:

(\s*\|\s*)*(\w+)\s*(<=?|>=?|=)([^:]+)(:(.*)$)?
|--------| |---|   |---------||-----||-|--|-|
     1       2          3        4    5 6

Or, in Java

Pattern.compile("(\\s*\\|\\s*)*(\\w+)\\s*(<=?|>=?|=)([^:]+)(:(.*)$)?");

group(2) is the feature name, group(3) is the operator, group(4) is the value, and group(6) is the result.

This is an excellent resource for testing regular expressions:

http://www.regexplanet.com/advanced/java/index.html

like image 1
Chris Nitchie Avatar answered Nov 18 '22 18:11

Chris Nitchie