I'm trying to split a string representing an XPath such as:
string myPath = "/myns:Node1/myns:Node2[./myns:Node3=123456]/myns:Node4";
I need to split on '/' (the '/' excluded from results, as with a normal string split) unless the '/' happens to be within the '[ ... ]' (where the '/' would both not be split on, and also included in the result).
So what a normal string[] result = myPath.Split("/".ToCharArray())
gets me:
result[0]: //Empty string, this is ok
result[1]: myns:Node1
result[2]: myns:Node2[.
result[3]: myns:Node3=123456]
result[4]: myns:Node4
results[2]
and result[3]
should essentially be combined and I should end up with:
result[0]: //Empty string, this is ok
result[1]: myns:Node1
result[2]: myns:Node2[./myns:Node3=123456]
result[3]: myns:Node4
Since I'm not super fluent in regex, I've tried manually recombining the results into a new array after the split, but what concerns me is that while it's trivial to get it to work for this example, regex seems the better option in the case where I get more complex xpaths.
For the record, I have looked at the following questions:
Regex split string preserving quotes
C# Regex Split - commas outside quotes
Split a string that has white spaces, unless they are enclosed within "quotes"?
While they should be sufficient in helping be with my problem, I'm running into a few issues/confusing aspects that prevent them from helping me.
In the first 2 links, as a newbie to regex I'm finding them hard to interpret and learn from. They are looking for quotes, which look identical between left and right pairs, so translating it to [ and ] is confusing me, and trial and error is not teaching me anything, rather, it's just frustrating me more. I can understand fairly basic regex, but what these answers do is a little more than what I currently understand, even with the explanation in the first link.
In the third link, I won't have access to LINQ as the code will be used in an older version of .NET.
You can split a String by whitespaces or tabs in Java by using the split() method of java. lang. String class. This method accepts a regular expression and you can pass a regex matching with whitespace to split the String where words are separated by spaces.
Q #4) How to split a string in Java without delimiter or How to split each character in Java? Answer: You just have to pass (“”) in the regEx section of the Java Split() method. This will split the entire String into individual characters.
Split is used to break a delimited string into substrings. You can use either a character array or a string array to specify zero or more delimiting characters or strings.
XPath is a complex language, trying to split an XPath expression on slashes at ground level fails in many situations, examples:
/myns:Node1/myns:Node2[./myns:Node3=123456]/myns:Node4
string(/myns:Node1/myns:Node2)
I suggest an other approach to cover more cases. Instead of trying to split, try to match each parts between slashes with the Regex.Matches(String, String)
method. The advantage of this way is that you can freely describe how look these parts:
string pattern = @"(?xs)
[^][/()]+ # all that isn't a slash or a bracket
(?: # predicates (eventually nested)
\[
(?: [^]['""] | (?<c>\[) | (?<-c>] )
| "" (?> [^""\\]* (?: \\. [^""\\]* )* ) "" # quoted parts
| ' (?> [^'\\]* (?: \\. [^'\\]* )* ) '
)*?
(?(c)(?!$)) # check if brackets are balanced
]
| # same thing for round brackets
\(
(?: [^()'""] | (?<d>\() | (?<-d>\) )
| "" (?> [^""\\]* (?: \\. [^""\\]* )* ) ""
| ' (?> [^'\\]* (?: \\. [^'\\]* )* ) '
)*?
(?(d)(?!$))
\)
)*
|
(?<![^/])(?![^/]) # empty string between slashes, at the start or end
";
Note: to be sure that the string is entirely parsed, you can add at the end of the pattern something like: |\z(?<=(.))
. This way, you can test if the capturing group exists to know if you are at the end of the string. (But you can also use the match position, the length and the length of the string.)
demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With