Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to split strings with String.split() and include the delimiters? [duplicate]

Tags:

java

regex

I have a multiline string which is delimited by a set of different delimiters:

(Text1)(DelimiterA)(Text2)(DelimiterC)(Text3)(DelimiterB)(Text4) 

I can split this string into its parts, using String.split, but it seems that I can't get the actual string, which matched the delimiter regex.

In other words, this is what I get:

  • Text1
  • Text2
  • Text3
  • Text4

This is what I want

  • Text1
  • DelimiterA
  • Text2
  • DelimiterC
  • Text3
  • DelimiterB
  • Text4

Is there any JDK way to split the string using a delimiter regex but also keep the delimiters?

like image 639
Daniel Rikowski Avatar asked Feb 05 '10 10:02

Daniel Rikowski


People also ask

Can I split a string by two delimiters Python?

Python has a built-in method you can apply to string, called . split() , which allows you to split a string by a certain delimiter.

How do you split a string by two delimiters?

To split a string with multiple delimiters in Python, use the re. split() method. The re. split() function splits the string by each occurrence of the pattern.


2 Answers

You can use lookahead and lookbehind, which are features of regular expressions.

System.out.println(Arrays.toString("a;b;c;d".split("(?<=;)"))); System.out.println(Arrays.toString("a;b;c;d".split("(?=;)"))); System.out.println(Arrays.toString("a;b;c;d".split("((?<=;)|(?=;))"))); 

And you will get:

[a;, b;, c;, d] [a, ;b, ;c, ;d] [a, ;, b, ;, c, ;, d] 

The last one is what you want.

((?<=;)|(?=;)) equals to select an empty character before ; or after ;.

EDIT: Fabian Steeg's comments on readability is valid. Readability is always a problem with regular expressions. One thing I do to make regular expressions more readable is to create a variable, the name of which represents what the regular expression does. You can even put placeholders (e.g. %1$s) and use Java's String.format to replace the placeholders with the actual string you need to use; for example:

static public final String WITH_DELIMITER = "((?<=%1$s)|(?=%1$s))";  public void someMethod() {     final String[] aEach = "a;b;c;d".split(String.format(WITH_DELIMITER, ";"));     ... } 
like image 128
NawaMan Avatar answered Sep 22 '22 06:09

NawaMan


You want to use lookarounds, and split on zero-width matches. Here are some examples:

public class SplitNDump {     static void dump(String[] arr) {         for (String s : arr) {             System.out.format("[%s]", s);         }         System.out.println();     }     public static void main(String[] args) {         dump("1,234,567,890".split(","));         // "[1][234][567][890]"         dump("1,234,567,890".split("(?=,)"));            // "[1][,234][,567][,890]"         dump("1,234,567,890".split("(?<=,)"));           // "[1,][234,][567,][890]"         dump("1,234,567,890".split("(?<=,)|(?=,)"));         // "[1][,][234][,][567][,][890]"          dump(":a:bb::c:".split("(?=:)|(?<=:)"));         // "[][:][a][:][bb][:][:][c][:]"         dump(":a:bb::c:".split("(?=(?!^):)|(?<=:)"));         // "[:][a][:][bb][:][:][c][:]"         dump(":::a::::b  b::c:".split("(?=(?!^):)(?<!:)|(?!:)(?<=:)"));         // "[:::][a][::::][b  b][::][c][:]"         dump("a,bb:::c  d..e".split("(?!^)\\b"));         // "[a][,][bb][:::][c][  ][d][..][e]"          dump("ArrayIndexOutOfBoundsException".split("(?<=[a-z])(?=[A-Z])"));         // "[Array][Index][Out][Of][Bounds][Exception]"         dump("1234567890".split("(?<=\\G.{4})"));            // "[1234][5678][90]"          // Split at the end of each run of letter         dump("Boooyaaaah! Yippieeee!!".split("(?<=(?=(.)\\1(?!\\1))..)"));         // "[Booo][yaaaa][h! Yipp][ieeee][!!]"     } } 

And yes, that is triply-nested assertion there in the last pattern.

Related questions

  • Java split is eating my characters.
  • Can you use zero-width matching regex in String split?
  • How do I convert CamelCase into human-readable names in Java?
  • Backreferences in lookbehind

See also

  • regular-expressions.info/Lookarounds
like image 26
polygenelubricants Avatar answered Sep 24 '22 06:09

polygenelubricants