Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fix malformed ellipses in a string

Tags:

java

string

regex

I want to fix malformed ellipses (...) in a String.

"Hello.. World.."
"Hello... World..."     // this is correct
"Hello.... World...."
"Hello..... World....."

should all be corrected to:

"Hello... World..."

The following regex handles any instance of 3 or more consecutive .'s:

line.replaceAll("\\.{3,}", "...");

However, I don't know how to handle the case when there are exactly 2 consecutive .'s. We cannot do something like this:

line.replaceAll("\\.{2}", "...");

For example, for "...", the code above will return "......", as the regex will replace the first 2 .'s (index 0 and 1), then the next 2 .'s (index 1 and 2), resulting in "..." + "..." = "......".

Something like this works:

line.replaceAll("\\.{2}", "...").replaceAll("\\.{3,}", "...");

...but there must be a better way!

like image 438
budi Avatar asked Oct 14 '15 23:10

budi


2 Answers

You can replace any group of two or more of .:

[.]{2,}

with ...

like image 153
Moishe Lipsker Avatar answered Sep 19 '22 21:09

Moishe Lipsker


Why not keep it simple?

\.\.+

If you really don't want it to mess with groups of 3 there's this:

\.{4,}|(?<!\.)\.{2}(?!\.)

What this does is this looks for groups larger than 3 first then it looks for groups of 2. The special thing about "..." is there are 2 groups of ".." in "...". So in (?!\.) you look for the 3rd "." after the first 2. If that 3rd "." exists then discard that result. This is called negative lookahead. To discard the 2nd ".." you have to perform negative lookbehind. So (?<!\.) looks for that "." before the 2nd ".." and this result is discarded if found.

Negative lookbehind can't be perform by javascript so I used one that uses the Java compiler.

Link: https://www.myregextester.com/?r=d41b2f7e

like image 29
Luminous Avatar answered Sep 23 '22 21:09

Luminous