Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a string on / when not within [ ]

I'm trying to split a string representing an XPath such as:

string myPath = "/myns:Node1/myns:Node2[./myns:Node3=123456]/myns:Node4";

I need to split on '/' (the '/' excluded from results, as with a normal string split) unless the '/' happens to be within the '[ ... ]' (where the '/' would both not be split on, and also included in the result).

So what a normal string[] result = myPath.Split("/".ToCharArray()) gets me:

result[0]: //Empty string, this is ok
result[1]: myns:Node1
result[2]: myns:Node2[.
result[3]: myns:Node3=123456]
result[4]: myns:Node4

results[2] and result[3] should essentially be combined and I should end up with:

result[0]: //Empty string, this is ok
result[1]: myns:Node1
result[2]: myns:Node2[./myns:Node3=123456]
result[3]: myns:Node4

Since I'm not super fluent in regex, I've tried manually recombining the results into a new array after the split, but what concerns me is that while it's trivial to get it to work for this example, regex seems the better option in the case where I get more complex xpaths.

For the record, I have looked at the following questions:
Regex split string preserving quotes
C# Regex Split - commas outside quotes
Split a string that has white spaces, unless they are enclosed within "quotes"?

While they should be sufficient in helping be with my problem, I'm running into a few issues/confusing aspects that prevent them from helping me.
In the first 2 links, as a newbie to regex I'm finding them hard to interpret and learn from. They are looking for quotes, which look identical between left and right pairs, so translating it to [ and ] is confusing me, and trial and error is not teaching me anything, rather, it's just frustrating me more. I can understand fairly basic regex, but what these answers do is a little more than what I currently understand, even with the explanation in the first link.
In the third link, I won't have access to LINQ as the code will be used in an older version of .NET.

like image 845
Code Stranger Avatar asked Nov 29 '16 16:11

Code Stranger


People also ask

How do you split a string according to spaces?

You can split a String by whitespaces or tabs in Java by using the split() method of java. lang. String class. This method accepts a regular expression and you can pass a regex matching with whitespace to split the String where words are separated by spaces.

How do I split a string without a separator?

Q #4) How to split a string in Java without delimiter or How to split each character in Java? Answer: You just have to pass (“”) in the regEx section of the Java Split() method. This will split the entire String into individual characters.

Can you use split on a string?

Split is used to break a delimited string into substrings. You can use either a character array or a string array to specify zero or more delimiting characters or strings.


1 Answers

XPath is a complex language, trying to split an XPath expression on slashes at ground level fails in many situations, examples:

/myns:Node1/myns:Node2[./myns:Node3=123456]/myns:Node4
string(/myns:Node1/myns:Node2)

I suggest an other approach to cover more cases. Instead of trying to split, try to match each parts between slashes with the Regex.Matches(String, String) method. The advantage of this way is that you can freely describe how look these parts:

string pattern = @"(?xs)
    [^][/()]+ # all that isn't a slash or a bracket
    (?: # predicates (eventually nested)
        \[ 
        (?: [^]['""] | (?<c>\[) | (?<-c>] )
          | "" (?> [^""\\]* (?: \\. [^""\\]* )* ) "" # quoted parts
          | '  (?> [^'\\]*  (?: \\. [^'\\]*  )* ) '
        )*?
        (?(c)(?!$)) # check if brackets are balanced
        ]
      |  # same thing for round brackets
        \(
        (?: [^()'""] | (?<d>\() | (?<-d>\) )
          | "" (?> [^""\\]* (?: \\. [^""\\]* )* ) ""
          | '  (?> [^'\\]*  (?: \\. [^'\\]*  )* ) '
        )*?
        (?(d)(?!$))
        \)
    )*
  |
    (?<![^/])(?![^/]) # empty string between slashes, at the start or end
";

Note: to be sure that the string is entirely parsed, you can add at the end of the pattern something like: |\z(?<=(.)). This way, you can test if the capturing group exists to know if you are at the end of the string. (But you can also use the match position, the length and the length of the string.)

demo

like image 53
Casimir et Hippolyte Avatar answered Sep 20 '22 17:09

Casimir et Hippolyte