I'm working on a log parser that should parse a line like this:
ID1 : 0 ID2 : 214 TYPE : ERROR DATE : 2012-01-11 14:08:07.432 CLASS : Maintenance SUBCLASS : Operations
ID1, ID2, TYPE, DATE, CLASS, and SUBCLASS are all keywords and I want to have something like this:
ID1 : 0
ID2 : 214
TYPE : ERROR
DATE : 2012-01-11 14:08:07.432
CLASS : Maintenance
SUBCLASS : Operations
I am really quite new to regex and I have the following:
(ID1|ID2|TYPE|DATE|CLASS|SUBCLASS)\\s*:\\s*(.+?)\\s*[(ID1|ID2|TYPE|DATE|CLASS|SUBCLASS)]
Of course, it does not work.
Any advice will be very much appreciated.
The main problem in your expression are the square brackets, they create a character class, this matches exactly one character from those inside.
(ID1|ID2|TYPE|DATE|CLASS|SUBCLASS)\\s*:\\s*(.+?)\\s*[(ID1|ID2|TYPE|DATE|CLASS|SUBCLASS)]
^ ^
I made the alternation at the end also a positive lookahead assertion (The group starting with ?=), so this is not matched, just ensured that one of those alternatives are ahead. I added also the end of the string $ to the alternation.
(ID1|ID2|TYPE|DATE|CLASS|SUBCLASS)\\s*:\\s*(.+?)\\s*(?=ID1|ID2|TYPE|DATE|CLASS|SUBCLASS|$)
See it here on Regexr, a good tool to test regular expressions!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With