I'm facing problem in splitting String
.
I want to split a String
with some separator but without losing that separator.
When we use somestring.split(String separator)
method in Java it splits the String
but removes the separator part from String
. I don't want this to happen.
I want result like below:
String string1="Ram-sita-laxman";
String seperator="-";
string1.split(seperator);
Output:
[Ram, sita, laxman]
but I want the result like the one below instead:
[Ram, -sita, -laxman]
Is there a way to get output like this?
To split a string without removing the delimiter:Use the str. split() method to split the string into a list. Use a list comprehension to iterate over the list. On each iteration, add the delimiter to the item.
The Split method extracts the substrings in this string that are delimited by one or more of the strings in the separator parameter, and returns those substrings as elements of an array.
string1.split("(?=-)");
This works because split
actually takes a regular expression. What you're actually seeing is a "zero-width positive lookahead".
I would love to explain more but my daughter wants to play tea party. :)
Edit: Back!
To explain this, I will first show you a different split
operation:
"Ram-sita-laxman".split("");
This splits your string on every zero-length string. There is a zero-length string between every character. Therefore, the result is:
["", "R", "a", "m", "-", "s", "i", "t", "a", "-", "l", "a", "x", "m", "a", "n"]
Now, I modify my regular expression (""
) to only match zero-length strings if they are followed by a dash.
"Ram-sita-laxman".split("(?=-)");
["Ram", "-sita", "-laxman"]
In that example, the ?=
means "lookahead". More specifically, it mean "positive lookahead". Why the "positive"? Because you can also have negative lookahead (?!
) which will split on every zero-length string that is not followed by a dash:
"Ram-sita-laxman".split("(?!-)");
["", "R", "a", "m-", "s", "i", "t", "a-", "l", "a", "x", "m", "a", "n"]
You can also have positive lookbehind (?<=
) which will split on every zero-length string that is preceded by a dash:
"Ram-sita-laxman".split("(?<=-)");
["Ram-", "sita-", "laxman"]
Finally, you can also have negative lookbehind (?<!
) which will split on every zero-length string that is not preceded by a dash:
"Ram-sita-laxman".split("(?<!-)");
["", "R", "a", "m", "-s", "i", "t", "a", "-l", "a", "x", "m", "a", "n"]
These four expressions are collectively known as the lookaround expressions.
I just wanted to show an example I encountered recently that combines two of the lookaround expressions. Suppose you wish to split a CapitalCase identifier up into its tokens:
"MyAwesomeClass" => ["My", "Awesome", "Class"]
You can accomplish this using this regular expression:
"MyAwesomeClass".split("(?<=[a-z])(?=[A-Z])");
This splits on every zero-length string that is preceded by a lower case letter ((?<=[a-z])
) and followed by an upper case letter ((?=[A-Z])
).
This technique also works with camelCase identifiers.
It's a bit dodgy, but you could introduce a dummy separator using a replace function. I don't know the Java methods, but in C# it could be something like:
string1.Replace("-", "#-").Split("#");
Of course, you'd need to pick a dummy separator that's guaranteed not to be anywhere else in the string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With