I have a string like this String str = "la$le\\$li$lo"
.
I want to split it to get the following output "la","le\\$li","lo"
. The \$ is a $ escaped so it should be left in the output.
But when I do str.split("[^\\\\]\\$")
y get "l","le\\$l","lo"
.
From what I get my regex is matching a$ and i$ and removing then. Any idea of how to get my characters back?
Thanks
Summary: To split a string and keep the delimiters/separators you can use one of the following methods: Use a regex module and the split() method along with \W special character. Use a regex module and the split() method along with a negative character set [^a-zA-Z0-9] .
split("\\s+") will split the string into string of array with separator as space or multiple spaces. \s+ is a regular expression for one or more spaces.
backslash-dot is invalid because Java doesn't need to escape the dot. You've got to escape the escape character to get it as far as the regex which is used to split the string.
To split a string by special characters, call the split() method on the string, passing it a regular expression that matches any of the special characters as a parameter. The method will split the string on each occurrence of a special character and return an array containing the results. Copied!
Use zero-width matching assertions:
String str = "la$le\\$li$lo";
System.out.println(java.util.Arrays.toString(
str.split("(?<!\\\\)\\$")
)); // prints "[la, le\$li, lo]"
The regex is essentially
(?<!\\)\$
It uses negative lookbehind to assert that there is not a preceding \
.
Simple sentence splitting, keeping punctuation marks:
String str = "Really?Wow!This.Is.Awesome!";
System.out.println(java.util.Arrays.toString(
str.split("(?<=[.!?])")
)); // prints "[Really?, Wow!, This., Is., Awesome!]"
Splitting a long string into fixed-length parts, using \G
String str = "012345678901234567890";
System.out.println(java.util.Arrays.toString(
str.split("(?<=\\G.{4})")
)); // prints "[0123, 4567, 8901, 2345, 6789, 0]"
Using a lookbehind/lookahead combo:
String str = "HelloThereHowAreYou";
System.out.println(java.util.Arrays.toString(
str.split("(?<=[a-z])(?=[A-Z])")
)); // prints "[Hello, There, How, Are, You]"
The reason a$ and i$ are getting removed is that the regexp [^\\]\$
matches any character that is not '\' followed by '$'. You need to use zero width assertions
This is the same problem people have trying to find q not followed by u.
A first cut at the proper regexp is /(?<!\\)\$/
( "(?<!\\\\)\\$"
in java )
class Test {
public static void main(String[] args) {
String regexp = "(?<!\\\\)\\$";
System.out.println( java.util.Arrays.toString( "1a$1e\\$li$lo".split(regexp) ) );
}
}
Yields:[1a, 1e\$li, lo]
You can try first replacing "\$" with another string, such as the URL Encoding for $ ("%24"), and then splitting:
String splits[] = str.replace("\$","%24").split("[^\\\\]\\$");
for(String str : splits){
str = str.replace("%24","\$");
}
More generally, if str is constructed by something like
str = a + "$" + b + "$" + c
Then you can URLEncode a, b and c before appending them together
import java.net.URLEncoder.encode;
...
str = encode(a) + "$" + encode(b) + "$" + encode(c)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With