Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a string without losing any word in Java?

Tags:

java

string

split

I was using eclipse for Java.

I want to split an input line without losing any char.

For example input line is:

MAC 4 USD7MAIR 2014 USD1111IMAC 123 USD232MPRO 2-0-1-5

And the output should be:

MAC 4 USD7,MAIR 2014 USD1111,IMAC 123 USD232,MPRO 2-0-1-5

(If I split with "M" or etc. the char M itself will be removed.)

What should I do?

like image 250
justDrink Avatar asked Feb 23 '26 16:02

justDrink


1 Answers

You need to use a positive lookahead.

string.split("(?=M)");

OR

string.split("(?<!^)(?=M)");

Example:

String totalString = "MAC 4 USD7MAIR 2014 USD1111IMAC 123 USD232MPRO 2-0-1-5";
String[] parts = totalString.split("(?=M)");
System.out.println(Arrays.toString(parts));

Output:

[MAC 4 USD7, MAIR 2014 USD1111I, MAC 123 USD232, MPRO 2-0-1-5]

Update:

The below regex would split the input according to the boundary which exists immediate after to USD\d+, \d+ here means one or more digits.

String totalString = "MAC 4 USD7MAIR 2014 USD1111IMAC 123 USD232MPRO 2-0-1-5";
String[] parts = totalString.split("(?<=\\bUSD\\d{1,99}+)");
System.out.println(Arrays.toString(parts));

Output:

[MAC 4 USD7, MAIR 2014 USD1111, IMAC 123 USD232, MPRO 2-0-1-5]

(?<=...) called positive look-behind assertion. In languages which support variable length lookbehind (C#), you could use (?<=\\bUSD\\d+). But unfortunately java won't support variable length lookbehind. So we define the digits like allow \d{1,99} digits from 1 to 99 means lookafter to the USD+digits upto 99. And the + after the } called possessive quantifier which won't let the regex engine to backtrack, thus matching the largest possible value.

like image 125
Avinash Raj Avatar answered Feb 25 '26 07:02

Avinash Raj