If you have this code:
"......".Split(new String[]{"...", ".."}, StringSplitOptions.None);
The resulting array elements are:
1. ""
2. ""
3. ""
Now if you reverse the order of the separators,
"......".Split(new String[]{"..", "..."}, StringSplitOptions.None);
The resulting array elements are:
1. ""
2. ""
3. ""
4. ""
From these 2 examples I feel inclined to conclude that the Split method recursively tokenizes as it goes through each element of the array from left to right.
However, once we throw in separators that contain alphanumeric characters into the equation, it is clear that the above theory is wrong.
"5.x.7".Split(new String[]{".x", "x."}, StringSplitOptions.None)
results in: 1. "5" 2. ".7"
"5.x.7".Split(new String[]{"x.", ".x"}, StringSplitOptions.None)
results in: 1. "5" 2. ".7"
This time we obtain the same output, which means that the rule theorized based on the first set of examples no longer applies. (ie: if separator precedence was always determined based on the position of the separator within the array, then in the last example we would have obtained "5."
& "7"
instead of "5"
& ".7"
.
As to why I am wasting my time trying to guess how .NET standard API's work, it's because I want to implement similar functionality for my java apps, but neither StringTokenizer nor org.apache.commons.lang.StringUtils provide the ability to split a String using multiple multi-character separators (and even if I were to find an API that does provide this ability, it would be hard to know if it always tokenizes using the same algorithm used by the String.Split method.
Use the String. split() method to split a string with multiple separators, e.g. str. split(/[-_]+/) . The split method can be passed a regular expression containing multiple characters to split the string with multiple separators.
The string split() method breaks a given string around matches of the given regular expression. After splitting against the given regular expression, this method returns a string array.
We just have to define an input string we want to split and a pattern. The next step is to apply a pattern. A pattern can match zero or multiple times. To split by different delimiters, we should just set all the characters in the pattern.
String split() Method: The str. split() function is used to split the given string into array of strings by separating it into substrings using a specified separator provided in the argument.
From MSDN:
To avoid ambiguous results when strings in separator have characters in common, the Split operation proceeds from the beginning to the end of the value of the instance, and matches the first element in separator that is equal to a delimiter in the instance. The order in which substrings are encountered in the instance takes precedence over the order of elements in separator.
So, for the first case ".." and "..." are found on the same position and their order in separator is used to determine the used one. For the second case, ".x" is found before "x." and the order of elements in separator does not apply.
I've had a quick look at this.. and it would appear that the private method MakeSeparatorList
in the string
class actually retrieves an array of indexes.. but it will match the first one it finds.
So, because .x
comes before x.
in both of your examples, that index is stored.
This is the code I used to test:
var s = "5.x.7";
string[] separators = new string[] { "x.", ".x" };
int[] sepList = new int[1024];
int[] lengthList = new int[1024];
MethodInfo dynMethod = s.GetType().GetMethods(BindingFlags.NonPublic | BindingFlags.Instance).Last(x => x.Name == "MakeSeparatorList");
dynMethod.Invoke(s, new object[] { separators, sepList, lengthList });
Debugger.Break();
See this screenshot:
(My screenshot isn't showing? :/)
Notice how the index is 1 (which results in .x
) even though .x
is the second entry in the array.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With