I am writing a Regex that will extract the various pieces of information from an EDIFACT UN Codes List. As there are tens of thousands of codes I do not wish to type them all in so I have decided to use Regex to parse the text file and extract out the bits that I need. The text file is structured in a way that I can easily identify the bits that I want.
I have created the following Regex using Regex Hero to test it, but I just cannot get it to match everything up to a double line break for the codeComment group. I have tried using the character class [^\n\n] but this still won't match double line breaks.
Note: I have selected the Multiline option on Regex Hero.
(?<element>\d+)\s\s(?<elementName>.*)\[[B|C|I]\]\s+Desc: (?<desc>[^\n]*\s*[^\n]*)
^\s*Repr: (?<type>a(?:n)?)..(?<length>\d+)
^\s*(?<code>\d+)\s*(?<codeName>[^\n]*)
^\s{14}(?<codeComment>[^\n]*)
This is the example text I am using to match.
----------------------------------------------------------------------
1073 Document line action code [B]
Desc: Code indicating an action associated with a line of a
document.
Repr: an..3
1 Included in document/transaction
The document line is included in the
document/transaction.
should capture this as well.
2 Excluded from document/transaction
The document line is excluded from the
document/transaction.
What I want is for codeComment to contain the following:
The document line is included in the
document/transaction.
should capture this as well.
but it is only extracting the first line:
The document line is included in the
In a character class, every character counts once, no matter how often you write it. So a character class can't be used to check for consecutive linebreaks. But you can use a lookahead assertion:
^\s{14}(?<codeComment>(?s)(?:(?!\n\n).)*)
(?s) switches on singleline mode (to allow the dot to match newlines).
(?!\n\n) asserts that there are no two consecutive linebreaks at the current position.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With