I need to parse an EDI file, where the separators are +
, :
and '
signs and the escape (release) character is ?
.
You first split into segments
var data = "NAD+UC+ABC2378::92++XYZ Corp.:Tel ?: ?+90 555 555 11 11:Mobile1?: ?+90 555 555 22 22:Mobile2?: ?+90 555 555 41 71+Duzce+Seferihisar / IZMIR++35460+TR"
var segments = data.Split('\'');
then each segment is split into segment data elements by +
, then segment data elements are split into component data elements via :
.
var dataElements = segments[0].Split('+');
the above sample string is not parsed correctly because of the use of release character. I have special code dealing with this, but I am thinking that this should be all doable using
Regex.Split(data, separator);
I am not familiar with Regex'es and could not find a way to do this so far. The best I came up so far is
string[] lines = Regex.Split(data, @"[^?]\+");
which omits the character before +
sign.
NA
U
ABC2378::9
+XYZ Corp.:Tel ?: ?+90 555 555 11 11:Mobile1?: ?+90 555 555 22 22:Mobile2?: ?+90 555 555 41 7
Duzc
Seferihisar / IZMI
+3546
TR
Correct Result Should be:
NAD
UC
ABC2378::92
XYZ Corp.:Tel ?: ?+90 555 555 11 11:Mobile1?: ?+90 555 555 22 22:Mobile2?: ?+90 555 555 41 7
Duzce
Seferihisar / IZMIR
35460
TR
So the question is this doable with Regex.Split, and what should the regex separator look like.
In python, we can split a string using regular expression. Let us see how to split a string using regex in python. We can use re.split () for the same. re is the module and split () is the inbuilt method in that module. Note: Make sure to import the re module or else it will not work. We can split the string using comma as a separator in python.
Example 1: This example splits a string by 2 separators Comma (, ) and space (‘ ‘) using .split () function. multiple separators. var str = "A, computer science, portal!"; Example 2: This example split the string by number of separators like Comma (, ), equal (=) and colon (:) using multiple .join () and .split () method.
Regex splits the string based on a pattern. It handles a delimiter specified as a pattern. This is why Regex is better than string.Split. Here are some examples of how to split a string using Regex in C#.
Parameters: This function accepts three parameters as mentioned above and described below: str: This parameter holds the string to be split. separator: It is optional parameter. It defines the character or the regular expression to use for breaking the string.
I can see that you want to split around plus signs +
only if they are not preceded (escaped) by a question mark ?
. This can be done using the following:
(?<!\?)\+
This matches one or more +
signs if they are not preceded by a question mark ?
.
Edit: The problem or bug with the previous expression if that it doesn't handle situations like ??+
or ???+
or or ????+
, in other words it doesn't handle situations where ?
s are used to escape themselves.
We can solve this problem by noticing that if there is an odd number of ?
preceding a +
then the last one is definitely escaping the +
so we must not split, but if there is an even number of ?
before a plus then those cancel out each leaving the +
so we should split around it.
From the previous observation we should come up with an expression that matches a +
only if it is preceded by an even number of question marks ?
, and here it is:
(?<!(^|[^?])(\?\?)*\?)\+
string[] lines = Regex.Split(data, @"\+");
would it meet the requirement??
Here is the edit for escaping the '?' before '+'.
string[] lines = Regex.Split(data, @"(?<!\?)[\+]+");
The '+' end the end would match multiple consecutive occurances of seperator '+'. If you want white spaces instead.
string[] lines = Regex.Split(data, @"(?<!\?)[\+]");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With