Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split string but keep delimiters in java? [duplicate]

Tags:

java

regex

I'm trying to recreate the way discord parses messages with emoji's inside of it.

For example, I want the message Hello, :smile::hearth: world! to split into the following array:

["Hello, ", ":smile:", ":hearth:", " world!"]

I've already tried to split the array with the following code:

Arrays.toString(message.split("(:[A-Za-z]+:)"))

However, the split method removes the delimiters found. So the end result looks like this:

["Hello", , , " world!"]
like image 772
Tjeu Foolen Avatar asked May 15 '19 19:05

Tjeu Foolen


People also ask

What does split \\ s+ do in Java?

split("\\s+") will split the string into string of array with separator as space or multiple spaces. \s+ is a regular expression for one or more spaces.

Can you use multiple delimiters in Java?

In order to break String into tokens, you need to create a StringTokenizer object and provide a delimiter for splitting strings into tokens. You can pass multiple delimiters e.g. you can break String into tokens by, and: at the same time. If you don't provide any delimiter then by default it will use white-space.


2 Answers

As from your input string and expected results, I can infer that you want to split your string basically from three rules.

  • Split from the point which is preceded and followed by a colon
  • Split from the point which is preceded by a space and followed by a colon
  • Split from the point which is preceded by a colon and followed by a space

Hence you can use this regex using alternations for all three cases mentioned above.

(?<=:)(?=:)|(?<= )(?=:)|(?<=:)(?= )

Regex Demo

Java code,

String s = "Hello, :smile::hearth: world!";
System.out.println(Arrays.toString(s.split("(?<=:)(?=:)|(?<= )(?=:)|(?<=:)(?= )")));

Prints like your expected output,

[Hello, , :smile:, :hearth:,  world!]

Also, as an alternative if you can use matching the text rather than split, the regex would be much simpler to use and it would be this,

:[^:]+:|\S+

Regex Demo using match

Java code,

String s = "Hello, :smile::hearth: world!";
Pattern p = Pattern.compile(":[^:]+:|\\S+");
Matcher m = p.matcher(s);
while(m.find()) {
    System.out.println(m.group());
}

Prints,

Hello,
:smile:
:hearth:
world!
like image 89
Pushpesh Kumar Rajwanshi Avatar answered Oct 26 '22 20:10

Pushpesh Kumar Rajwanshi


Please use regular expression's Lookahead ,Lookbehind to get expected result. Please refer below code snippet to

 public static void main(String[] args) {
       String message= "Hello, :smile::hearth: world!"; 
       System.out.println(Arrays.toString(message.split("(?=,)|(?=(?!)::)|(?<=(:[A-Za-z]+:))")));


    }

Which will give output as [Hello, , :smile:, :hearth:, world!]

like image 28
Ajinkyad Avatar answered Oct 26 '22 20:10

Ajinkyad