Here are three sample lines from my dataset:
| | | | featureB >= 16104.33 : 18873.52 (1/0)
| featureA >= 17980.32
featureC = ABC BLAH BLAH blA'H $blah 4/ blah blah
I am trying to come up with a pattern matcher which would capture the following:
I came up with the following pattern, but it fails to capture the feature value:
Pattern.compile("(?:\\| )*(.*?)(>?=|<)((?!:).)*(?::?)(.*?)(?:\\(.*\\))?")
So basically my aim is for group(1)
to contain the feature name, group(2)
to contain the relation, group(3)
to contain the feature value, and group(4)
to contain the result if it exists.
Currently group(1)
, group(2),
and group(4)
produce what I'm expecting but group(3)
is never captured and is always empty.
I would appreciate any help/advice.
Based on your well drafted requirements I came up with this regex to capture all 4 groups (4th being the optional one):
^[ |]*(\w+)\s*(>?=|<)\s*([^:]+)(?:\s*:\s*([^()]*))?
Java pattern:
Pattern p = Patttern.compile("^[ |]*(\\w+)\\s*(>?=|<)\\s*([^:]+)(?:\\s*:\\s*([^(]+))?.*$");
RegEx Demo
In group 5 is the optional bracket content.
^[ |]*(\w+)\s*(>?=|<)\s*([^:]+?)(?:\s*:\s*([^\(]+))?(\(.*)?$
See example @ https://regex101.com/r/bP6xJ4/1
This appears to work for all of your inputs:
(\s*\|\s*)*(\w+)\s*(<=?|>=?|=)([^:]+)(:(.*)$)?
|--------| |---| |---------||-----||-|--|-|
1 2 3 4 5 6
Or, in Java
Pattern.compile("(\\s*\\|\\s*)*(\\w+)\\s*(<=?|>=?|=)([^:]+)(:(.*)$)?");
group(2)
is the feature name, group(3)
is the operator, group(4)
is the value, and group(6)
is the result.
This is an excellent resource for testing regular expressions:
http://www.regexplanet.com/advanced/java/index.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With