Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I count the number of matches for a regex?

Tags:

java

regex

Let's say I have a string which contains this:

HelloxxxHelloxxxHello 

I compile a pattern to look for 'Hello'

Pattern pattern = Pattern.compile("Hello"); Matcher matcher = pattern.matcher("HelloxxxHelloxxxHello"); 

It should find three matches. How can I get a count of how many matches there were?

I've tried various loops and using the matcher.groupCount() but it didn't work.

like image 653
Tony Avatar asked Sep 11 '11 13:09

Tony


People also ask

How do you count matches in a regular expression?

To count the number of regex matches, call the match() method on the string, passing it the regular expression as a parameter, e.g. (str. match(/[a-z]/g) || []). length . The match method returns an array of the regex matches or null if there are no matches found.

Can regular expressions count?

Regular expressions can be used for a variety of text processing tasks, such as word-counting algorithms or validation of text inputs. In this tutorial, we'll take a look at how to use regular expressions to count the number of matches in some text.

How can I find all matches to a regular expression in Python?

findall(pattern, string) returns a list of matching strings. re. finditer(pattern, string) returns an iterator over MatchObject objects.

Which regex matches one or more digits?

+: one or more ( 1+ ), e.g., [0-9]+ matches one or more digits such as '123' , '000' . *: zero or more ( 0+ ), e.g., [0-9]* matches zero or more digits. It accepts all those in [0-9]+ plus the empty string.


2 Answers

matcher.find() does not find all matches, only the next match.

Solution for Java 9+

long matches = matcher.results().count(); 

Solution for Java 8 and older

You'll have to do the following. (Starting from Java 9, there is a nicer solution)

int count = 0; while (matcher.find())     count++; 

Btw, matcher.groupCount() is something completely different.

Complete example:

import java.util.regex.*;  class Test {     public static void main(String[] args) {         String hello = "HelloxxxHelloxxxHello";         Pattern pattern = Pattern.compile("Hello");         Matcher matcher = pattern.matcher(hello);          int count = 0;         while (matcher.find())             count++;          System.out.println(count);    // prints 3     } } 

Handling overlapping matches

When counting matches of aa in aaaa the above snippet will give you 2.

aaaa aa   aa 

To get 3 matches, i.e. this behavior:

aaaa aa  aa   aa 

You have to search for a match at index <start of last match> + 1 as follows:

String hello = "aaaa"; Pattern pattern = Pattern.compile("aa"); Matcher matcher = pattern.matcher(hello);  int count = 0; int i = 0; while (matcher.find(i)) {     count++;     i = matcher.start() + 1; }  System.out.println(count);    // prints 3 
like image 80
aioobe Avatar answered Sep 25 '22 10:09

aioobe


This should work for matches that might overlap:

public static void main(String[] args) {     String input = "aaaaaaaa";     String regex = "aa";     Pattern pattern = Pattern.compile(regex);     Matcher matcher = pattern.matcher(input);     int from = 0;     int count = 0;     while(matcher.find(from)) {         count++;         from = matcher.start() + 1;     }     System.out.println(count); } 
like image 43
Mary-Anne Wolf Avatar answered Sep 22 '22 10:09

Mary-Anne Wolf