Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to keep the delimiter while using RegEx?

I did a question about punctuation and regex, but it was confusing.

Supossing I have this text:

String text = "wor.d1, :word2. wo,rd3? word4!"; 

I'm doing this:

String parts[] = text.split(" ");

And I have this:

wor.d1, | :word2. | wor,d3? | word4!;

What I need to do to have this? (Keep the the symbols at the borders, but only that I specify: .,!?:, not all).

wor,d1 | , | : | word2 | . | wor,d3 | ? | word4 | !

UPDATE

I'm getting some good results with these regex, but it's giving an empty char before all splits on punctuation at start of a word.

There is a way to not have this empty char at the start?

Is this regex is good, or there is a more simple way?

public static final String PUNCTUATION_SEPARATOR =
        "("
        + "("
        + "(?=^[\"'!?.,;:(){}\\[\\]]+)"
        + "|"
        + "(?<=^[\"'!?.,;:(){}\\[\\]]+)"
        + ")"
        + "|"
        + "("
        + "(?=[\"'!?.,;:(){}\\[\\]]+($|\n))"
        + "|"
        + "(?<=[\"'!?.,;:(){}\\[\\]]+($|\n))"
        + ")"
        + ")";
like image 824
Renato Dinhani Avatar asked Aug 19 '11 21:08

Renato Dinhani


People also ask

What is delimiter in regex?

Delimiters. The first element of a regular expression is the delimiters. These are the boundaries of your regular expressions. The most common delimiter that you'll see with regular expressions is the slash ( / ) or forward slash.

Can we split string using regex?

split(String regex) method splits this string around matches of the given regular expression. This method works in the same way as invoking the method i.e split(String regex, int limit) with the given expression and a limit argument of zero. Therefore, trailing empty strings are not included in the resulting array.

How do you split with delimiter?

You can use the split() method of String class from JDK to split a String based on a delimiter e.g. splitting a comma-separated String on a comma, breaking a pipe-delimited String on a pipe, or splitting a pipe-delimited String on a pipe.


2 Answers

Are you sure you want to use regex ? There's a faster implementation for splitting by single char: StringTokenizer. And it that can return the delimiters.

String str= "word1, word2. word3? word4!";
String delim = ",.!?";
StringTokenizer st = new StringTokenizer(str, delim, true);
while (st.hasMoreTokens()) {
  String token = st.nextToken();
  ... // token will be: "word1", ",", " word2", ".", etc...
}
like image 99
m_vitaly Avatar answered Oct 02 '22 11:10

m_vitaly


For simple separators I recommend the StringTokenizer. But here's a solution using regex and another auxiliary separator:

String s  = "one,two, three   four ,  five";
s = s.replaceAll("([,\\s]+)", "#$1#");
Pattern p = Pattern.compile("#");
String[] result = p.split(s);
like image 25
mradu Avatar answered Oct 02 '22 11:10

mradu