Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how does the String.Split method determine separator precedence when passed multiple multi-character separators?

If you have this code:

"......".Split(new String[]{"...", ".."}, StringSplitOptions.None);

The resulting array elements are:

 1. ""
 2. ""
 3. ""

Now if you reverse the order of the separators,

"......".Split(new String[]{"..", "..."}, StringSplitOptions.None);

The resulting array elements are:

 1. ""
 2. ""
 3. ""
 4. ""

From these 2 examples I feel inclined to conclude that the Split method recursively tokenizes as it goes through each element of the array from left to right.

However, once we throw in separators that contain alphanumeric characters into the equation, it is clear that the above theory is wrong.

  "5.x.7".Split(new String[]{".x", "x."}, StringSplitOptions.None)

results in: 1. "5" 2. ".7"

   "5.x.7".Split(new String[]{"x.", ".x"}, StringSplitOptions.None)

results in: 1. "5" 2. ".7"

This time we obtain the same output, which means that the rule theorized based on the first set of examples no longer applies. (ie: if separator precedence was always determined based on the position of the separator within the array, then in the last example we would have obtained "5." & "7" instead of "5" & ".7".

As to why I am wasting my time trying to guess how .NET standard API's work, it's because I want to implement similar functionality for my java apps, but neither StringTokenizer nor org.apache.commons.lang.StringUtils provide the ability to split a String using multiple multi-character separators (and even if I were to find an API that does provide this ability, it would be hard to know if it always tokenizes using the same algorithm used by the String.Split method.

like image 847
John Smith Avatar asked Feb 07 '13 22:02

John Smith


People also ask

How do you split a string with multiple separators?

Use the String. split() method to split a string with multiple separators, e.g. str. split(/[-_]+/) . The split method can be passed a regular expression containing multiple characters to split the string with multiple separators.

How does string split work?

The string split() method breaks a given string around matches of the given regular expression. After splitting against the given regular expression, this method returns a string array.

How do you pass multiple delimiters in Java?

We just have to define an input string we want to split and a pattern. The next step is to apply a pattern. A pattern can match zero or multiple times. To split by different delimiters, we should just set all the characters in the pattern.

How do I split a string with multiple separators in typescript?

String split() Method: The str. split() function is used to split the given string into array of strings by separating it into substrings using a specified separator provided in the argument.


2 Answers

From MSDN:

To avoid ambiguous results when strings in separator have characters in common, the Split operation proceeds from the beginning to the end of the value of the instance, and matches the first element in separator that is equal to a delimiter in the instance. The order in which substrings are encountered in the instance takes precedence over the order of elements in separator.

So, for the first case ".." and "..." are found on the same position and their order in separator is used to determine the used one. For the second case, ".x" is found before "x." and the order of elements in separator does not apply.

like image 152
J. Calleja Avatar answered Nov 09 '22 16:11

J. Calleja


I've had a quick look at this.. and it would appear that the private method MakeSeparatorList in the string class actually retrieves an array of indexes.. but it will match the first one it finds.

So, because .x comes before x. in both of your examples, that index is stored.

This is the code I used to test:

var s = "5.x.7";

string[] separators = new string[] { "x.", ".x" };
int[] sepList = new int[1024];
int[] lengthList = new int[1024];

MethodInfo dynMethod = s.GetType().GetMethods(BindingFlags.NonPublic | BindingFlags.Instance).Last(x => x.Name == "MakeSeparatorList");
dynMethod.Invoke(s, new object[] { separators, sepList, lengthList });

Debugger.Break();

See this screenshot:

(My screenshot isn't showing? :/)

Notice how the index is 1 (which results in .x) even though .x is the second entry in the array.

like image 43
Simon Whitehead Avatar answered Nov 09 '22 16:11

Simon Whitehead