Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex for matching duplicate consecutive punctuation characters with the exception of 3 periods

I have regex

(\p{P})\1 

which successfully matches duplicate consecutive punctuation characters like

;;
,,
\\

, but i need to exclude 3 period (ellipsis) punctuation.

...
like image 215
Artūras Avatar asked Sep 16 '13 06:09

Artūras


3 Answers

Be careful, as some approaches will not successfully match strings of the form .## (i.e. a '.' before repeating punctuation). Assuming that is something that should match.

This solution satisfies the following requirements: -

  1. Repeated punctuation is matched.
  2. Ellipsis (...) is not matched.
  3. Two dots (..) and four or more dots are matched.
  4. Repeated punctuation is matched when preceded or followed by dots, e.g. .##

This is the regex:

(?>(\p{P})\1+)(?<!([^.]|^)\.{3})

Explanation:

  • ?> means atomic grouping. Specifically, throw away all backtracking positions. It means that if '...' fails to match, then don't step back and try and match '..'.
  • (\p{P})\1+) means match 2 or more punctuation characters - you already had this.
  • (?<!([^.]|^)\.{3}) means search backwards from the end of the repeated character match and fail if you find three dots not preceded by a dot or beginning of string. This fails three dots while allowing two dots or four dots or more to work.

The following test cases pass and illustrate use:

string pattern = @"(?>(\p{P})\1+)(?<!([^.]|^)\.{3})";

//Your examples:
Assert.IsTrue( Regex.IsMatch( @";;", pattern ) );
Assert.IsTrue( Regex.IsMatch( @",,", pattern ) );
Assert.IsTrue( Regex.IsMatch( @"\\", pattern ) );
//two and four dots should match
Assert.IsTrue( Regex.IsMatch( @"..", pattern ) );
Assert.IsTrue( Regex.IsMatch( @"....", pattern ) );

//Some success variations
Assert.IsTrue( Regex.IsMatch( @".;;", pattern ) );
Assert.IsTrue( Regex.IsMatch( @";;.", pattern ) );
Assert.IsTrue( Regex.IsMatch( @";;///", pattern ) );            
Assert.IsTrue( Regex.IsMatch( @";;;...//", pattern ) ); //If you use Regex.Matches the matches contains ;;; and // but not ...
Assert.IsTrue( Regex.IsMatch( @"...;;;//", pattern ) ); //If you use Regex.Matches the matches contains ;;; and // but not ...            

//Three dots should not match
Assert.IsFalse( Regex.IsMatch( @"...", pattern ) );
Assert.IsFalse( Regex.IsMatch( @"a...", pattern ) );
Assert.IsFalse( Regex.IsMatch( @";...;", pattern ) );                        

//Other tests
Assert.IsFalse( Regex.IsMatch( @".", pattern ) );
Assert.IsFalse( Regex.IsMatch( @";,;,;,;,", pattern ) );  //single punctuation does not match                        
Assert.IsTrue( Regex.IsMatch( @".;;.", pattern ) );
Assert.IsTrue( Regex.IsMatch( @"......", pattern ) );                                       
Assert.IsTrue( Regex.IsMatch( @"a....a", pattern ) );
Assert.IsFalse( Regex.IsMatch( @"abcde", pattern ) );     
like image 74
acarlon Avatar answered Nov 09 '22 09:11

acarlon


To avoid matching ...

(?<![.])(?![.]{3})(\p{P})\1
like image 2
Anirudha Avatar answered Nov 09 '22 08:11

Anirudha


(?<!\.)(?!\.{3}(?!\.))(\p{P})\1+

This will match any repeated punctuation (including .... or ...... etc) unless it is the string .... For example:

; -- No Match
;; -- Match
,, -- Match
,,,, -- Match
\\ -- Match
... -- No Match
.... -- Match
like image 2
JonM Avatar answered Nov 09 '22 09:11

JonM